Architectures  and  Algorithms 
for  Parallel  Updates  of 


Raster  Scan  Displays 


Satish  Gupta 

Computer  Science  Department 
Carnegie- Mellon  University 


December,  1981 


« 


DEPARTMENT 

of 

COMPUTER  SCIENCE 


ELECTE 
OEC  0  7  1962 


Carnegie -Mel  Ion  University 

82  12  07  014 


CMU-CS-82-111 


Architectures  and  Algorithms  for  Parallel 


Updates  of  Raster  Scan  Displays 

Satish  Gupta 

Computer  Science  Department 
Carnegie- Mellon  University 


December,  1981 


Submitted  to  Carncgic-Mellon  University  in  partial  fulfillment  of  the  requirements  for  the  degree  of 
Doctor  of  Philosophy. 


Copyright  ©  1982  Satish  Gupta 


This  research  is  supported  by  the  Defense  Advanced  Research  Projects  Agency,  Department  of 
Defense,  ARPA  Order  3597,  monitored  by  the  Air  Force  Avionics  laboratory  under  contract 
F33615-78-C-1551.  The  views  and  conclusions  contained  in  this  document  arc  those  of  the  authors 
and  should  not  be  interpreted  as  representing  the  official  policies,  cither  expressed  or  implied,  of  the 
Defense  Advanced  Research  Projects  Agency  or  the  U.S.  Government. 


TABU;  OF  CONTENTS 


Table  of  Contents 


Acknowledgements 

1.  Introduction 

1.1.  Raster  scan  displays 

1.2.  Background 

1.3.  Thesis  outline 

2.  Display  Memory  Organization 

2.1.  Scan-line  Organization 

2.2.  Symmetric  Organization 

2.3.  Staggered  square  organization 

2.3.1.  Addressing  squares 

2.3.2.  Addressing  horizontal  spans 

2.4.  Conclusion 

3.  Flexible  Memory  Organizations 

3.1.  Row-Column  organization 

3.2.  Row-Column-Square  organization 

3.3.  General  rectangular  oiganization 

3.4.  Conclusion 

4.  BITBLT 

4.1.  BitBlt  operations 

4.2.  BitBlt  implementation 

4.2.1.  Nxl  arbitrary  access 

4.2.2.  Nxl  fixed  access 

4.2.3.  MxM  arbitrary  access 

4.2.4.  MxM  fixed  access 

4.3.  BitBlt  communication 

4.3.1.  Nxl  communication 

4.3.2.  MxM  communication 

4.3.3.  Non-neighbor  communication 

4.3.4.  Overlapping  communication  with  memory  access 

4.4.  BitBlt  performance 

4.4.1.  Small  area  BitBlt 

4.4.2.  Large  area  BitBlt 

4.4.3.  Analysis 


5.  Line  Drawing 

5.1.  Bit-map  lines 

5.2.  Measure  of  appearance 

5.3.  Bit-map  lines  using  precomputed  strokes 

5.3.1.  The  N-step  algorithm 

5.3.2.  The  (N  -  l)-step  algorithm 

5.4.  Better  lines  using  a  larger  number  of  strokes 

5.4.1.  Optimal  lines  using  strokes 

5.5.  Total  number  of  strokes 

6.  Filtering 

6.1.  Computing  filtered  images 

6.1.1.  Filtering  straight  edges 

6.1.2.  Universal  Table  for  polygons 

6.2.  Combining  filtered  images 

6.3.  Summary 

7.  Filtered  Scan-Conversion 

7.1.  Filtered  line  drawing 

7.1.1.  Incremental  algorithms  for  filtering  edges 

7.1.2.  Drawing  lines  using  strokes 

7.1.2.1.  Computed  strokes 

7.1. 2.2.  Precomputed  Strokes 

7.2.  Filtered  trapezoids 

7.2.1.  Incremental  algorithms  for  filling  trapezoids 

7.2.2.  Filling  trapezoids  using  patches 

7.2.2.I.  Computing  patches 

7.3.  Conclusion 

8.  Image  Processing 

8.1.  Neighbor  transformations 

8.1.1.  Translation 

8.1.2.  Scaling 

8.1.3.  Rotation 

8.2.  Convolution 

8.3.  Conclusion 

9.  Display  Design 

9.1.  Memory  organization 

9.1.1.  Screen  refresh 

9.1.2.  Memory  addressing 

9.1.3.  Masking 

9.2.  lntcrproccssor  communication 

9.3.  Processor 

9.4.  Display  chip 

9.4.1.  Data  path 

9.4.2.  Memory  interface 

9.4.3.  lntcrproccssor  communication 

9.4.4.  Op  code 

9.4.5.  Video  buffer 


TABLE  OF  CONTENTS 


9.4.6.  Pin  summary 

9.5.  Simulation  and  performance 

9.6.  Design  scalability 

9.7.  Status 

10.  Conclusion 
10.1.  Future  designs 


ACKNOWLEDGEMENTS 


1 


Acknowledgements 


This  space  has  been  used  by  people  to  thank  everybody  starting  from  their  dog  for  licking  their 
stamps  to  God  for  creating  the  universe.  1  shall  restrict  myself  only  to  thanking  the  people  who 
directly  helped  with  my  thesis. 

The  foremost  on  the  list  is  Bob  Sproull,  my  thesis  advisor.  Bob  was  an  invaluable  advisor  and  his 
contributions  to  my  thesis  are  innumerable.  Working  with  Bob  is  an  immense  learning  experience 
and  the  experience  I  gained  while  working  with  him  was  extremely  helpful  in  all  of  my  research.  As 
Bob  himself  h<is  said,  advisorship  is  a  tenured  position  for  life,  and  I  intend  to  hold  him  to  it 

I  also  received  continued  help  and  support  from  the  rest  of  my  thesis  committee  -  Gene  Ball, 
Danny  Cohen,  Raj  Reddy,  and  Guy  Steele.  1  am  sure  they  set  a  joint  record  for  the  fastest  time  taken 
to  read  a  diesis  draft.  I  am  extremely  thankful  to  them  for  their  effort 

It  is  not  possible  to  write  a  thesis  in  the  area  of  computer  graphics  without  acknowledging  Ivan 
Sudicrland.  I  owe  the  initial  ideas  in  this  thesis  to  him. 

And  finally,  the  most  important  contribution  to  the  successful  completion  of  my  diesis  came  from 
all  my  friends  here  at  CMU.  I  believe  that  friends  are  die  most  important  part  of  graduate  school, 
without  whom  life  in  graduate  school  would  be  impossible.  My  list  of  friends  would  be  too  long  to 
include  in  this  page,  so  I  will  simply  say  -  Thank  you,  all! 


INTRODUCTION 


3 


Chapter  1 
Introduction 

Computer  displays  encompass  different  devices  like  cathode  ray  tubes,  plotters,  and  printers.  Rut 
the  underlying  problems  for  all  such  devices  can  be  understood  by  considering  only  the  CRT.  'Fhis 
thesis  explores  the  design  space  for  efficient  implementations  of  CRT  displays. 

Displays  can  be  built  by  using  either  the  calligraphic  or  raster  scanning  techniques.  Calligraphic 
displays  create  images  by  drawing  straight  lines  from  point  to  point.1  Complicated  electronics  are 
used  to  draw  these  lines  and  the  displays  flicker  when  too  many  lines  are  displayed.  By  contrast, 
raster  displays  create  images  by  scanning  the  whole  display  from  left  to  right  and  top  to  bottom 
repeatedly,  which  is  why  they  arc  called  raster  scan  displays.  The  electron  beam’s  intensity  is 
modified  appropriately  for  each  point  in  the  display  (called  a  pixel).  Hence,  raster  displays  present 
the  image  information  as  intensity  samples  for  each  point  of  the  display.  This  storage  is  referred  to  as 
the  frame  buffer.  Unlike  calligraphic  displays,  raster  displays  offer  the  advantage  of  displaying 
arbitrarily  complex,  flicker-free  images.  They  also  have  the  advantage  of  being  able  to  use  mass- 
produced  television  monitor  technology  and  are  cheaper  as  a  result 

This  thesis  discusses  raster  scan  displays  only,  and  ignores  calligraphic  displays.  CRT  terminals 
used  only  to  display  text  arc  also  ignored. 

1 .1 .  Raster  scan  displays 

In  1970,  Ivan  Sutherland  [Sutherland  70]  predicted  that  raster  displays  would  become  a  common 
fonn  of  computer  output  within  a  very  few  years.  This  prediction  did  not  come  true  for  two  reasons: 
memory  costs  were  too  high  and  the  raster  displays  built  could  not  be  updated  fast  enough  to  be 
usable  for  interactive  applications.  But  by  now,  raster  displays  arc  in  widespread  use. 


1 


Some  calligraphic  displays  draw  curved  segments,  but  they  arc  Tew  and  rare. 


% 


Table  l-l  shows  the  cost  of  random  access  memory  since  1971  for  a  512x512  bit- map  display, 
which  is  a  display  in  which  each  pixel  can  be  only  off  or  on.  Early  frame  buffers  used  disks  and 
drums  for  storage  due  to  the  prohibitive  cost  of  random  access  memory  [Tcrlet  67]  [Ophir  68].  When 
semiconductors  became  economical,  some  designs  used  LSI  shift  registers  [McCracken  75],  but  since 
these  memories  were  not  very  fast,  the  displays  tended  to  have  low  resolution.  The  resolution  of 
Terlet’s  disk  display  was  320x192  and  McCracken’s  shift  register  display  had  a  resolution  of  256x256. 
It  was  not  until  recently  that  random  access  memory  has  become  cheap  enough  to  be  used  for 
inexpensive  raster  displays. 

Year  Cost  of  Memory 

1971  $2500 

1973  $1250 

1975  $500 

1977  $250 

1979  $125 

1981  $60 

Table  1-1:  Cost  of  a  512x512  bit-map  display  memory  over  the  past  decade. 

The  image  on  a  raster  scan  display  has  to  be  continuously  scanned,  which  requires  continual 
memory  access.  This  process  is  usually  referred  to  as  screen  refresh.  To  achieve  high  resolution,  the 
rate  at  which  the  screen  is  refreshed  has  to  be  extremely  fast.  For  example,  a  768x1024  display 
refreshed  60  times  per  second  displays  a  pixel  every  16  nanoseconds.  As  a  result,  the  memory  must 
access  several  pixels  in  parallel  in  order  to  keep  up  with  the  screen  refresh.  For  example,  if  the  access 
time  for  the  memory  is  256  nanoseconds,  at  least  16  pixels  have  to  be  retrieved  in  every  memory 
access  to  maintain  the  required  refresh  rate.  Idle  time  during  horizontal  and  vertical  retrace  would 
then  be  used  to  update  the  frame  buffer. 


To  achieve  high  update  speeds,  several  pixels  have  to  be  updated  simultaneously.  This  speed  is 
extremely  crucial  for  interactive  uses  of  the  display  because  large  amounts  of  information  may  have 
to  be  changed  even  for  conceptually  simple  operations.  This  problem  is  best  illustrated  by  some 
examples. 

A  768x1024  display  can  show  approximately  6000  characters,  occupying  an  area  of  approximately 
500.000  pixels  on  the  display,  'fable  1-2  shows  the  time  taken  to  generate  a  new  scrccnfull  of 
characters  assuming  various  update  times  for  individual  pixels.  Frame  bufFcr  systems  in  which  the 
host  computer  has  to  update  separately  every  pixel  of  every  character  might  take  10  ps/pixcl,  in 
which  case  the  total  time  required  to  generate  a  new  image  is  unacceptable.  To  speed  up  the  process. 


INTRODUCTION 


5 


new  characters  can  be  copied  into  the  frame  buffer  by  writing  several  pixels  in  parallel.  If  16  pixels 
can  be  read  or  written  in  parallel,  then  a  memory  access  time  of  approximately  500  ns  allows  16  pixels 
can  be  copied  in  1  /as,  resulting  in  an  update  speed  of  1/16  /is/pixel. 

Update  time  per  pixel  Total  time  for  6000  characters 

10  /is.  5  seconds 

1  /is.  .5  seconds 

1/16  /is.  30  ms. 

Table  1-2:  Time  taken  to  update  a  screen  of  6000  characters. 

The  operation  of  copying  pixels  from  one  part  of  the  frame  buffer  memory  to  another  is  also  required 
when  scrolling  a  window  of  the  frame  buffer.  For  the  scrolling  of  the  entire  768x1024  display  to 
appear  smooth  it  should  happen  in  less  than  one  frame  time  (c.g.  1/60  of  a  second).  This  requires  an 
update  bandwidth  of  45  Mcgapixels/second. 

Typical  calligraphic  displays  can  draw  several  thousand  lines  during  each  refresh  period.  If  we 
assume  each  vector  to  be  approximately  100  pixels  long  (about  1/10  of  the  display),  then  a  frame 
buffer  display  should  update  at  least  3  Mcgapixels/second  in  order  to  emulate  a  vector  display.  Table 
1-2  shows  that  this  cannot  be  done  unless  several  pixels  are  updated  in  parallel. 

1.2.  Background 

Raster  display  designs  started  the  use  of  random  access  memory  to  store  the  image  only  when 
memory  prices  declined  sufficiently.  Some  designs  refreshed  the  screen  from  the  computer  memory, 
using  an  elementary  peripheral  controller  [Noll  71]  [Thacker  81].  Others  implemented  the  frame 
buffer  as  an  independent  unit  with  an- interface  to  allow  the  host  computer  to  read  and  write  image 
data  [Denes  75]  [Kajiya  75]  [Haskett  76].  These  two  architectures,  known  as  the  integral  frame  buffer 
and  peripheral  frame  buffer  respectively,  are  shown  in  Figures  1-1  and  1-2  respectively.  In  die  first 
architecture  the  memory  can  be  used  either  by  the  computer  for  general  and  graphics  processing  or 
by  the  video  controller  for  screen  refresh.  The  second  architecture  separates  the  computer  memory 
from  the  frame  buffer  memory,  hence  increasing  die  total  bandwidth  available  to  the  system.  Some 
designs  add  a  graphics  processor  between  the  computer  bus  and  die  frame  buffer  memory  to  aid  in 
graphics  processing  (Figure  1-3).  Unfortunately,  some  computers  provid  such  frame  buffers  on  the 
I/O  bus,  where  graphical  activity  use  the  I/O  channel  and  must  compete  with  other  channel 
actividcs.  Some  typical  functions  included  in  diis  processor  arc  screen  coordinate  to  memory  address 
translation,  line  drawing,  rectangle  filling  and  character  printing. 


Processor  Bus 


Figure  1-1:  Integral  frame  buffer 


Processor  Bus 


Figure  1-2:  Peripheral  frame  buffer 


Processor  Bus  or  I/O  Bus 

Figure  1*3:  Peripheral  frame  buffer  with  graphics  processor 


In  all  these  cases,  the  memory  is  organized  into  words  and  each  word  contains  more  than  one  pixel 
[Thacker  81]  [Bnskctt  76]  [Hcchtolshcim  80].  These  pixels  arc  made  to  lie  along  a  scan  line  so  that  the 
video  controller  can  simply  load  them  into  a  shift  register  and  shift  them  out  as  video  for  the  CRT.  As 
discussed  before,  this  is  necessary  because  of  the  high  data  rate  required  by  the  refresh.  The  number 
of  pixels  per  word  is  usually  16  or  32.  The  host  computer  accesses  the  frame  buffer  in  between  refresh 
accesses  and  during  the  CRT  horizontal  or  vertical  retrace. 


INTRODUCTION 


7 


This  kind  of  memory  organization  influences  the  algorithms  used  to  update  die  image.  As 
mentioned  before,  it  is  desirable  to  update  several  pixels  at  a  time  in  order  to  achieve  a  fast  update  of 
the  whole  image.  The  memory  organization  calls  for  dicse  pixels  to  lie  along  a  scan  line.  This  leads  to 
the  use  of  what  arc  commonly  known  as  scan  line  algorithms  to  create  and  update  images.  These 
algoridims  usually  generate  die  image  such  that  successive  pixels  written  lie  along  a  scan  line. 

The  8x8  display  [Sutherland  81]  was  designed  using  a  symmetric  square  organization  which  allows 
access  to  any  8x8  square  of  die  screen.  This  design  is  based  on  die  belief  that  die  regions  of  the 
display  diat  are  commonly  accessed  simultaneously  are  no  more  likely  to  be  tall  and  thin  than  to  be 
short  and  wide.  The  display  is  a  two-dimensional  device,  and  neidier  of  the  axes  should  be  favored  by 
the  design.  The  8x8  display  is  intended  for  applications  which  copy  pixels  from  one  part  of  die  screen 
to  another.  This  operation  is  known  as  BitBlt  (BLock  Transfer  of  BITs),  and  is  useful  for  character 
generation  as  well  as  for  scrolling  and  other  rearrangements  of  images  already  displayed  on  the 
screen. 

The  8x8  display  provided  sufficient  evidence  that  the  symmetric  square  memory  organization  was 
indeed  a  good  idea  for  display  memory.  An  attempt  to  design  a  system  which  would  be  usable  for  a 
larger  set  of  display  applications  led  to  this  thesis.  This  diesis  presents  algorithms  for  updating 
squares  for  a  large  class  of  display  applications  including  BitBlt,  graphics,  and  image  processing. 
These  algorithms  reconfirm  that  the  square  organization  is  indeed  an  excellent  memory  organization 
for  efficient  updates.  The  requirements  imposed  by  these  algorithms  are  then  used  to  present  a 
display  design  for  the  8x8  organization. 

1.3.  Thesis  outline 

Chapter  2  of  this  thesis  defines  the  scan-line  and  square  memory  organizations.  After  presenting 
these  two  organizations  it  introduces  a  staggered  square  organization  which  provides  access  to  both 
the  symmetric  squares  required  for  updates  and  scan-line  spans  convenient  for  screen  refresh. 

The  idea  of  having  a  convenient  memory  organization  for  parallel  updates  can  be  extended  to 
updating  the  exact  geometry  needed  by  individual  applications.  For  example,  the  character  generator 
would  like  to  update  5x7  rectangles,  the  line  drawer  would  like  64x1  for  horizontal  lines,  1x64  for 
vertical  lines  etc.  Chapter  3  discusses  methods  for  allowing  such  indulgences.  These  methods  arc  not 
implcmcntablc  due  to  current  technological  limitations,  but  will  certainly  provide  ways  of  using 
future  advances  in  microelectronics.  The  contents  of  Chapter  3  arc  hence  not  directly  related  to  the 
main  theme  of  this  thesis  and  may  be  omitted  by  the  casual  reader. 


8 


To  use  fully  the  capabilities  provided  by  convenient  memory  organizations,  all  display  applications 
have  to  use  efficiently  the  provided  memory  bandwidth.  Chapter  4  and  5  discuss  algorithms  usable 
for  BitBlt  and  line  drawing.  BitBlt  is  a  fairly  general  operator  that  provides  the  ability  to  move  an 
arbitrarily  sized  rectangle  of  the  image  from  one  part  of  the  display  to  another,  and  can  be  used  to 
provide  most  of  the  higher-level  operations  required  by  editing  and  browsing  tasks.  BitBlt  has  been 
implemented  traditionally  by  using  parallel  updates  in  the  scan-line  organization.  Chapter  4  presents 
these  algorithms  as  well  as  the  algorithms  for  the  square  organization.  These  algorithms  are  then 
compared  as  an  aid  in  deciding  which  memory  organization  to  use  under  different  circumstances. 

Line  drawing  is  the  most  frequently  used  graphical  primitive.  Chapter  5  discusses  several 
algorithms  which  generate  several  pixels  along  a  line  in  each  step.  These  algorithms  use  the  square 
memory  organization  and  a  precomputed  set  of  strokes  which  can  be  put  together  like  a  jigsaw  puzzle 
to  form  all  possible  lines.  The  strokes  are  stored  in  a  manner  identical  to  character  fonts  and  are  put 
together  using  BitBlt.  As  can  be  expected,  there  are  tradeoffs,  and  algorithms  that  use  a  larger  base  set 
of  strokes  generate  more  accurate  lines  than  algorithms  that  use  a  smaller  set  of  strokes.  The 
precomputed  strokes  technique  can  be  extended  to  trapezoid  filling  by  using  precomputed  patches. 
Trapezoids  can  then  be  put  together  to  form  polygons. 

Bit-map  graphics  suffer  from  annoying  edges  called  "jaggies".  These  defects  can  be  removed  by 
using  grayscale  pixels  to  smooth  the  edges,  together  with  a  process  known  as  anti-aliasing.  Chapters 
6  and  7  discuss  techniques  and  algorithms  to  render  anti-aliased  graphics.  The  first  of  these  chapters 
discusses  anti-aliasing  techniques.  The  techniques  discussed  focus  on  using  table  lookups  to  ease  the 
burden  of  large  computations.  These  techniques  can  be  implemented  very  efficiently  in  frame  buffer 
systems,  because  extra  frame  buffer  memory  can  be  used  to  store  the  tables.  Chapter  7  discusses 
algorithms  that  use  the  anti-aliasing  techniques  to  provide  high  performance  graphics.  The  tasks 
illustrated  arc  line  drawing  and  trapezoid  filling. 

The  square  memory  organization  is  also  useful  for  implementing  certain  image  processing 
algorithms.  Chapter  8  presents  algorithms  for  image  processing  that  are  used  to  present  images  in 
more  desirable  formats.  'Hie  transformations  discussed  arc  translation,  scaling,  and  rotation.  While 
translation  over  pixel  boundaries  was  the  subject  of  the  discussion  on  BitBlt  in  Chapter  4,  the 
discussion  in  this  chapter  is  generalized  to  translation  over  sub-pixel  distances.  Similarly,  the 
discussion  on  scaling  and  rotation  is  generalized  to  non-integer  scaling  and  non-perpendicular 
rotation. 


INTRODUCTION 


9 


Chapter  9  presents  the  design  of  a  memory  system  intended  for  an  8x8  memory  organisation, 
which  tries  to  meet  the  requirements  imposed  by  the  various  applications  and  their  algorithms.  The 
design  uses  one  custom-designed  LSI  chip,  the  specifications  of  which  arc  presented. 


DISPLAY  MEMORY  ORGANIZATION 


11 


Chapter  2 

Display  Memory  Organization 

The  frame  buffer  memory  has  two  primary  properties.  It  can  be  updated  to  change  the  data 
contained  and  hence  produce  new  images.  It  can  also  be  accessed  to  display  the  image  on  output 
devices.  Refresh  type  output  devices  such  as  CRTs  require  continuous  scanning  while  storage  devices 
like  plasma  panels  require  updating  only  when  the  image  changes.  Both  the  update  and  the  output 
operations  require  parallel  access  of  more  than  one  pixel  of  the  image.  To  provide  this  parallelism, 
the  frame  buffer  memory  is  organized  into  words,  where  each  word  contains  more  than  one  pixel. 
The  memory  organization  determines  how  the  pixels  in  each  word  map  onto  the  display.  If  N  pixels 
are  accessed  in  each  memory  access,  then  the  display  can  be  designed  using  N  random  access  memory 
chips,  each  memory  chip  being  capable  of  accessing  one  pixel  in  each  memory  access2.  Another 
aspect  of  the  organization  is  the  mapping  of  the  location  of  the  N  pixels  on  the  display  to  the 
addresses  to  be  used  for  accessing  the  memory  chips. 

• 

Although  several  pixels  can  be  written  in  parallel  we  also  need  the  ability  to  update  a  subset  for 
occasions  when  smaller  updates  are  required.  This  can  be  done  by  masking.  Masks  are  yV-bit  values 
that  specify  which  of  the  N  memory  chips  should  be  written  into.  They  can  essentially  be  used  as 
write-enables  for  the  memory  chips,  in  which  case  a  0  will  enable  a  memory  chip  and  a  1  will  abort 
the  write.  Although  2N  masks  arc  possible  only  a  small  set  is  used  in  practice.  The  masks  typically 
used  arc  the  ones  that  allow  the  update  of  physically  contiguous  pixels  in  the  A' pixel  word. 

This  chapter  assumes  that  the  origin  of  the  display  coordinate  system  lies  at  ihc  top  left  corner  of 
the  screen.  The  x-coordinatcs  increase  towards  the  right  and  the  y-coordinatcs  increase  towards  the 
bottom  of  the  screen.  In  the  mathematical  notation  used  in  this  chapter,  stands  for  multiplication, 
"/"  for  division,  and  "%".for  modulus.  The  operation  <ij>  extracts  bits  i  through  j  of  a  word,  where 
bit  0  is  the  least  significant  bit.  The  "  • "  operation  concatenates  bit  fields. 


2 

If  each  pixel  contains  more  than  one  bit.  then  g*N  memory  chips  arc  used,  where  s  is  the  number  of  bits  per  pixel.  If  more 
than  one  bit  could  be  accessed  in  parallel  from  each  chip,  then  fewer  chips  would  be  required. 


12 


2.1 .  Scan-line  Organization 

Conventional  display  memory  organizations  map  all  pixels  in  each  word  along  a  scan  line  on  the 
display.  This  organization  is  motivated  by  the  high  data  rate  required  by  the  video  refresh  of  CRTs.  A 
768x1024  non-interlaced  display  refreshes  the  screen  at  the  rate  of  16  ns/pixel.  Assuming  the 
memory  cycle  time  to  be  250  ns,  each  memory  access  has  to  provide  at  least  the  next  16  pixels  along 
the  scan-line  for  the  refresh  controller  to  keep  up  with  the  data  rate  required.  If  this  display  is 
implemented  with  16  memory  chips,  then  the  only  time  available  to  update  the  display  is  during  the 
horizontal  and  vertical  retrace  intervals. 

The  video  controller  for  a  display  that  uses  such  a  scan-line  organization  can  simply  load  the  pixels 

into  an  A-pixel  shift  register  and  shift  them  out  as  die  video  signal  for  the  CRT.  If  the  same  address 

is  given  to  each  of  the  N  memory  chips,  the  N  pixels  accessed  lie  along  a  scan  line.  With  this  scheme 

the  N  pixels  accessed  during  each  memory  cycle  arc  forced  to  be  aligned  to  the  TV-bit  word  boundary 

« 

of  the  screen  (Figure  2-1). 


xsize 


If  the  size  of  the  display  is  xsize  by  ysize,  then  the  size  of  each  RAM  chip  would  be  (xsize ysizc)/ N 
pixels.  If  A(x,y)  is  the  address  used  to  access  a  pixel  located  at  (x,y)  then  a  possible  address  mapping 
could  be 

A(x,y) :  =  x/N  +  y  (xsize/ N). 

Commonly  N,  xsize,  and  ysize  arc  powers  of  two  which  enable  the  address  computation  to  be 


DISPLAY  MEMORY  ORGANIZATION  13 

performed  by  simply  using  bit-field  extractions.  As  an  example,  if  xsize  and  ysize  arc  1024  each  and 
N  is  16  then 

RamSize :  =  64  x  1024  (64K  pixels) 
and 

A(xy) :  =  jc/16  +  64>> 
which  can  be  written  as 

A(xy) :  =  ><9:0>  •  x<9:4> 

Commercial  RAM  chips  require  that  the  address  be  split  up  into  two  different  parts  in  order  to  time 
multiplex  the  address  lines.  These  two  parts  are  called  the  Row  Address  and  the  Column  Address 
respectively  (because  they  are  used  as  decoders  along  rows  and  columns  in  the  memory  array).  A(x,y) 
could  be  split  in  the  following  manner  to  produce  8-bit  RA(x,y)  and  CA(xy)  respectively. 

RA(x,y):=  yd  l:0>x<9:4> 

CA(x,y) :  =  ><9:2> 

In  this  memory  organization,  an  update  of  fewer  than  N  pixels  has  to  be  performed  by  using  a 
mask  which  will  disable  writing  some  of  the  memory  chips.  Assuming  that  we  want  to  update  only 
contiguous  pixel  segments  and  that  this  segment  can  be  located  at  an  arbitrary  pixel  boundary,  we 
need  masks  originating  and  ending  at  every  pixel  of  the  N  pixel  span.  Such  masks  can  be  specified  by 
<a,b>,  where  a  specifies  the  starting  position  of  the  memory  chips  enabled  and  b  specifies  the  ending 
position,  such  that  a>  b  (Figure  2-2).  In  this  representation  there  exist  N(N— 1)/2  masks.  If  the  value 
of  the  enabled  mask  bit  is  0,  then  each  of  these  masks  can  be  computed  as 

for  c :  =  0  to  N- 1  do 

if(a>  c>  b)  then  MasKc>  :  =  0 
else  A(ask<c> :  =  1. 

A  standard  trick  is  to  use  only  IN  masks  instead  of  N(N— 1)/2.  All  masks  of  the  form  <N—  1  ,b> 
and  <a,0>  arc  precomputed  and  <a,b>  is  computed  as 

<a,b>:=  <N—  l,b>  OR  <a,0>. 

This  scheme  is  used  in  the  Lisp  machine  [Bawdcn  77]. 


N-1  a  bo 


a 

D 

□ 

a 

□ 

□ 

□ 

□ 

□ 

□ 

□ 

□ 

a 

a 

□ 

a 

Figure  2-2:  Mask  which  enables  the  subficld  <a,b>. 


14 


A  disadvantage  of  this  kind  of  word  organization  is  that  it  divides  the  display  into  boundaries 
corresponding  to  the  word  boundaries  in  memory.  Updates  can  take  place  only  on  an  Nx  1  grid 
aligned  to  the  boundaries  of  the  screen.  If  it  is  necessary  to  write  N  pixels  into  the  display  memory 
which  are  not  aligned  to  the  A  pixel  boundary  on  the  screen,  the  write  must  be  split  up  into  two 
separate  writes  to  adjoining  words  of  memory.  To  load  an  8x10  character  into  an  8x1  organized 
display  would  probably  take  20  memory  cycles  and  10  only  in  the  rare  ease  of  the  character  being 
aligned  with  the  word  boundary. 

The  scan-line  organization  can  be  modified  to  provide  access  to  any  arbitrary  span  of  N  pixels. 
This  is  done  by  addressing  different  memory  chips  with  separate  addresses.  As  shown  in  Figure  2-3, 
an  arbitrary  span  of  N  pixels  can  cross  only  one  word  boundary  and  two  addresses  arc  sufficient  to 
access  the  span.  In  addition,  the  two  addresses  differ  only  by  one.  If  we  want  to  access  an  Af-pixel 
span  whose  left  edge  is  located  at  (x,y),  then  the  addresses  used  will  be  A(x,y)  and  A(x,y)+1  and  the 
following  procedure  can  be  used  to  determine  the  address  for  each  chip.  The  address  used  depends 
upon  the  offset  from  the  boundary  /,  and  the  position  of  the  memory  chip  which  is  determined  by  a 
identification  number  c(0  <c<  N).  All  chips  in  the  range  (i<  c  <  N)  receive  the  address  A(x,y)  and 
all  chips  in  the  range  (0  £  c  <  i)  receive  the  address  A(x,y)+ 1.  For  N  =  16,  the  address  computation 
can  be  expressed  as 

CA :  =  y<9:2>; 

RA  :=  ><l:0>x<9:4>; 

i :  =  jK3:0>; 

if (c  <  i)  then  RA  :=  RA  +  1. 

The  update  algorithms  perform  better  in  this  ease  than  in  the  ease  with  aligned  boundaries,  especially 
for  updating  small  regions  of  the  screen  where  splitting  the  update  over  the  boundary  significantly 
increases  the  overhead. 


(x,y) 


DISPI  AY  MEMORY  ORGANIZATION 


15 


close  proximity  to  the  previous  ones.  They  can  make  use  of  the  fact  that  updating  the  addresses  for 
die  next  span  is  easier  than  recomputing  a  new  set  of  addresses.  After  computing  the  addresses  for  a 
span  at  (xy)  using  the  procedure  described,  the  addresses  for  the  span  at  (x+N,y)  can  be  updated 
simply  by 

RA(x+N,y)  :=  RA(xj)  +  1. 

When  updates  can  be  aligned  to  arbitrary  pixel  boundaries,  all  smaller  updates  can  be  forced  such 
that  the  left  corners  of  the  array  of  pixels  updated  correspond  to  die  left  corners  of  the  array  of  pixels 
addressed.  With  this  restriction  we  need  only  N  masks  for  the  N  possible  sizes  that  can  be  updated, 
all  of  which  originate  at  die  left  corner.  Mask  m  is  characterized  by  Os  in  the  leftmost  m  bits  and  Is  in 
the  remaining  bits.  This  mask  can  be  used  with  a  one-to-one  correspondence  to  the  write-enables  of 
the  N  memory  chips  only  when  the  comer  of  the  span  being  accessed  abuts  the  Nx  1  grid  boundary. 
In  the  case  where  the  span  being  updated  has  an  offset  /  from  die  boundary  of  the  grid,  the  left  bit  of 
the  mask  is  used  as  the  write-enable  for  the  /th  memory  chips,  the  next  bit  for  the  /+ 1  memory  chips 
and  so  on.  This  additional  complication  can  be  solved  in  either  of  two  ways.  We  could  cither 
precompute  masks  for  each  value  of  /,  which  again  requires  us  to  have  N2  masks,  or  we  could  use  the 
N  masks  and  align  them  to  the  memory  chips  depending  upon  the  the  value  of  i.  This  alignment  can 
be  performed  by  rotating  the  mask  to  the  right  by  a  rotate  count  of  i  and  will  be  die  subject  of  a  more 
detailed  discussion  in  Chapter  4. 

2.2.  Symmetric  Organization 

The  scan  line  organization  imposes  an  inherent  asymmetry  on  update  operations:  horizontal 
updates  are  easier  than  the  vertical  ones.  The  asymmetry  is  easily  seen  in  the  line  drawing  example.  A 
horizontal  line  can  be  drawn  very  quickly  but  only  one  pixel  of  a  vertical  line  can  be  drawn  in  each 
memory  cycle.  It  is  for  these  reasons  that  die  designers  of  the  8x8  display  [Sutherland  81]  chose  a 
symmetric  8x8  organization.  The  8x8  display  can  read  or  write  64  pixels  in  each  memory  cycle;  die  64 
pixels  read  or  written  lie  on  an  8x8  square  on  die  screen.  The  complexity  of  line  drawing  is  now 
symmetric  with  respect  to  the  x  and  y  axes  of  the  screen.  This  symmetric  organization  effectively 
provides  a  lower  access  time  for  image  generation.  Successive  pixels  generated  by  most  image 
generators  (we  have  seen  characters  and  lines  already)  lie  close  to  each  other  in  any  direction.  'ITiis 
organization  allows  them  to  be  updated  together,  which  effectively  provides  them  a  low  average 
access  time. 

An  MxM  memory  organization  requires  M 2  memory  chips.  Fach  memory  chip  will  provide  one 


3 


id 


■1 


A 


£ 


□  □ 


pixel  for  each  MxM  square  of  the  screen  and,  if  the  memory  array  is  viewed  as  an  MxM  square,  then 
there  is  a  one-to-one  correspondence  between  the  memory  chips  and  the  pixels  of  the  screen  (see 
Figure  2-4).  The  top  leftmost  bit  of  each  MxM  square  on  the  screen  comes  from  the  top  leftmost  chip 
of  the  memory  array,  the  top  rightmost  bit  from  the  top  rightmost  chip,  the  bottom  leftmost  bit  from 
the  bottom  leftmost  chip,  and  so  on.  If  each  chip  is  identified  by  a  column  number  and  a  row 
number  in  the  array,  then  point  (x,y)  on  the  screen  will  be  located  in  chip  (x%M,  /fcAf). 


□  □  □  □ - □  □ 

□  □  □  □ - □  □ 

□  □  □ - □ 

I 

Memory  chips  j 

- □ 

□  □ - -  □  □ 


Display  * 


Figure  2-4:  Symmetric  Memory  Organization. 
Marked  pixels  are  mapped  to  the  top  left  memory  chip. 


The  address  provided  to  the  memory  will  be  determined  by  the  position  of  the  MxM  square 
accessed.  All  MxM  squares  on  the  screen  can  be  identified  by  a  column  number  and  a  row  number 
(there  arc  xsizc/M  columns  and  ysize/M  rows).  'Ill esc  column  and  row  numbers  can  easily  provide 
the  column  and  rows  addresses  required  by  the  memory.  The  column  number  of  the  MxM  square  is 
used  as  the  column  address  to  the  memory  chips  and  the  row  number  is  used  as  the  row  address. 
Hence  point  (x,y)  on  the  screen  will  be  located  in  chip  (x%M,  fkM)  at  address  (x/M,  y/M)  where 
x/M  is  the  column  address  and  y/M  is  the  row  address. 


The  addressing  scheme  described  has  the  same  boundary  problem  that  we  discussed  before.  It  docs 
not  allow  access  to  an  arbitrarily  positioned  MxM  square  on  the  screen  in  one  memory  cycle. 
However,  the  8x8  display  allows  an  arbitrary  MxM  square  to  be  accessed  in  one  memory  cycle  by 
providing  separate  addresses  to  different  parts  of  the  memory  array.  As  shown  in  Figure  2-5,  if  we 


DISPLAY  MEMORY  ORGANIZATION 


17 


want  to  access  the  AfxAf  square  positioned  at  then  the  memory  chips  in  region  A  will  be 
addressed  by  (x/Afy/Af),  chips  in  B  by  (x/Af+ 1  ,y/Af),  chips  in  C  by  (x/At,y/M+ 1),  and  chips  in  D 
by  (x/M+l,y/t\f+\).  The  square  we  want  to  access  is  offset  from  the  boundaries  by  /(=  x  mod  M) 
and  j(=  y  mod  Af).  If  the  memory  chips  are  identified  by  (c,r)  (where  c  is  the  column  number  and  r 
is  the  row  number),  then  Region  A  in  Figure  2-5  is  identified  by  the  memory  chips  in  the  range  (/  <  c 
<  MJ <r<  \f),  Region  B  by  (0  <  c  <  i,  j  <  r  <  Af),  Region  C  by  (/  <  c  <  M,  0  <  r  <  j),  and  Region  D 
by  (0  <  c  <  i,  0  <  r  <  j).  This  computation  can  be  represented  proccdurally  as 

CA(x,y) :  =  x/  M; 

RA(x,y)  :=  y/  A/; 

/:  =  x%  M\ 
j:  =  y%  M ; 

if  (c  <  0  then  CA  :  =  CA  +  1; 
if  (r  <  j )  then  RA  :  =  RA  +  1. 


Memory  chips 


Display 

Figure  2-5:  Accessing  an  arbitrarily  aligned  MxM  square 


These  addresses  arc  easily  updated  for  squares  in  close  proximity.  Hie  addresses  for  the  square  at 
(x+ M.y )  can  updated  simply  by 

CA(x+M,y)  :=  CA(x,y)  +  1. 

Similarly  the  addresses  for  the  square  at  (aro'+  At)  can  be  updated  by 
RA(xj+  A  f) :  =  RA(xy)  +  1. 


Masking  in  the  two-dimcnsionally  symmetric  organization  is  a  two-dimensional  version  of  the 
masking  problem  in  the  scan-line  organization.  We  now  need  the  ability  to  update  rectangles  smaller 


18 


than  the  MxM  square.  As  in  the  scan-line  organization  we  can  restrict  these  rectangles  to  be  updated 
such  that  the  upper  left  corner  of  the  rectangle  corresponds  to  the  upper  left  corner  of  the  A/xA/ 
square  being  updated.  With  this  restriction  we  need  only  M~  masks  for  all  possible  rectangles.  Mask 
( p,q )  is  defined  by  Os  in  the  top  left  pxq  rectangle  and  Is  in  the  rest  of  the  square.  These  two- 
dimensional  masks  can  be  obtained  by  ORing  two  one-dimensional  masks.  The  x-dircctional 
parameter  p  is  used  to  obtain  a  mask  with  Os  in  the  left  p  columns  and  the  y-dircctional  parameter  q 
similarly  obtains  a  mask  with  Os  in  the  top  q  rows.  The  two  masks  are  ORed  to  give  the  desired 
rectangular  mask. 

As  in  the  case  of  the  scan-line  organization,  these  masks  have  to  be  aligned  to  move  the  bits  to  the 
correct  memory  chip  where  they  can  be  used  as  the  write  enables.  This  alignment  requires  a  two- 
dimensional  rotation  that  will  be  discussed  in  Chapter  4. 

2.3.  Staggered  square  organization 

The  unconventional  square  organization  makes  raster-scanned  screen  refresh  harder  than  in  the 
scan-line  organization.  The  problem  arises  from  the  fact  that  only  M  pixels  out  of  the  A/2  accessed  in 
one  memory  access  lie  along  the  scan-line  being  refreshed.  The  choice  is  cither  to  use  just  these  M 
pixels  and  ignore  the  rest  which  would  result  in  a  very  low  memory  bandwidth  utilization  For  screen 
refresh  or  to  store  all  the  pixels  read  into  a  separate  bufFcr  and  use  the  buffer  for  refreshing  successive 
scan  lines.  The  8x8  display  provides  two  buffers  each  capable  of  holding  8  scan  lines  each.  One  of  the 
buffers  is  used  to  refresh  the  screen  while  the  other  buffer  is  being  filled  with  the  next  8  scan  lines 
from  the  memory. 

A  staggered  organization  of  the  form  shown  in  Figure  2-6  can  be  used  to  provide  access  to  both  the 
symmetrical  squares  required  by  the  update  operations  and  the  horizontal  spans  required  for  the 
screen  refresh.  Successive  squares  along  the  x-dircction  arc  shifted  up  by  one  line.  The  edge 
conditions  on  the  top  and  bottom  edges  arc  wrapped  around.  This  staggering  still  allows  the  access  of 
any  arbitrarily  positioned  MxM  square,  while  also  allowing  the  access  of  any  A/?x  1  span  of  pixels  to 
case  the  screen  refresh.  Figure  2-6  also  shows  how  this  organization  allows  access  to  horizontal  spans. 
The  first  M  pixels  of  the  span  arc  stored  in  the  top  row  of  memory  chips,  the  next  M  pixels  arc  stored 
in  the  second  row,  and  so  on  until  the  final  M  pixels  of  the  span  arc  stored  in  the  bottom  row  of 
memory  chips.  The  fact  that  all  the  pixels  in  the  span  are  stored  in  different  memory  chips  allows 
them  to  be  accessed  in  one  memory  cycle. 


DISPLAY  MEMORY  ORGANIZATION 


.9 


Figure  2-6:  Staggering  5x5  squares  to  access  horizontal  spans. 

Figure  2-7  shows  the  memory  mapping  of  this  staggered  arrangement.  The  pixel  at  (x,y)  is  located 
in  the  memory  chip  (x%M,(y+  x/\I)%M)  and  is  addressed  by  (x/M,(y+  x/M)/ht).  This  mapping  is 
similar  to  the  unstaggered  mapping  with  the  replacement  of  y  by  (y+x/M).  This  replacement 
represents  the  staggering  by  adding  to  die  value  of  y  the  displacement  added  by  the  stagger  which 
depends  upon  the  x  position. 


□  □□□- 
□  □  □  □  - 
□  □□--- 


□  □ 
□  □ 
■-  □ 


Memory  chips 


□ - 

□  □  □ 


-  □ 
□  □ 


Display 


Figure  2*7:  Staggered  square  organization. 
Marked  pixels  arc  mapped  to  die  top  left  memory  chip. 


20 


2.3.1 .  Addressing  squares 

Four  different  addresses  arc  required  to  access  an  arbitrarily  positioned  MxM  square.  The 
situation  is  shown  in  Figure  2-8  which  show-s  a  MxM  square  positioned  at  (jrj1).  This  square  is  offset 
from  the  boundaries  by  /(=  x/M)  and  j(=  (y+  x/M)/ M)).  Region  A  is  identified  by  the  chips  in 
the  range  (/  <  c  <  M,  j  <  r  <  A/).  and  is  addressed  by  a  column  address  (CA)  of  (x/M)  and  row 
address  (RA)  of (y+  x/Af)/M).  Region  B  is  identified  by  (0  <  c  <  i,  j+l  <  r <  M)  and  addressed  by 
(CA  +  1,/M).  Region  C  is  identified  by  (i<c<  A/,0  <  r<J)  and  addressed  by  (CA,RA  + 1).  Region 
D  is  identified  by  (0  <  c  <  /, 0  <  r  <  j+ 1)  and  addressed  by  (CA  +  1./M  + 1).  The  addresses  can  be 
procedurally  computed  in  the  following  manner: 

CA  :  =  x /nr, 

RA  :=  (y+x/m)/m: 
i :  =  x  %  in; 
j ;  =  (y+  x! ni)  %  in; 
if  (c  <  i)  then  j ;  =  j  +  1; 
if  (c  <  /)  then  CA  ;  =  CA  +  1; 
if (r<J)  then  RA  ;=  RA  +  1. 


Display 

Figure  2-8:  Accessing  an  arbitrarily  aligned  MxM  square 


Incrementing  die  addresses  for  accessing  the  square  at  (jt,y+  M)  is  simply 
RA(x,y+  Af) :  =  RAO rj)  +  1. 

ITic  problem  with  incrementing  the  addresses  for  accessing  the  squares  at  (.r+A/j)  is  that  the 
boundary  marking  the  squares  changes  its  position  and  hence  it  is  not  sufficient  to  merely  increment 
the  column  address.  The  situation  is  illustrated  in  Figure  2-9,  which  marks  the  memory  chips  that 


DISPLAY  MEMORY  ORGANIZATION 


21 


have  to  increment  their  row  address.  These  memory  chips  arc  identified  in  the  range  (/'  <  c  <  M,  r  = 
j+ 1)  and  (0  <  c  <  i,  r  =  j).  The  value  of  j has  already  been  modified  to  j+ 1  when  c  <  /.  'finis  we  can 
combine  the  two  sets  into  (0  <  c  <  M,  r  =  J),  and  can  proccdurally  represent  the  incrementation  of 
the  addresses  for  (x+ARy)  as 

CA(x+M,y)  :=  CA(x,y)  +  1; 

if  (r  =  0  %  AO)  then  RA(x+  M,y) :  =  RA(x,y)  +  1. 

Further  increments  along  the  x-dircction  can  be  handled  in  two  ways.  We  can  either  continue  to 
modify  the  value  of  j  so  that  continued  increments  can  be  done  in  exactly  the  same  way  as  the  first 
one.  The  shift  of  boundaries  merely  increments  die  value  of  j,  and  that  is  all  that  has  to  be  done  to  be 
able  to  continuously  increment  tine  addresses.  The  incrementing  procedure  can  now  be  written  as 

CA(x+  M,y) :  =  CA(x,y )  +  1; 

if (r  =  (/'%  t\f))  then  RA(x+Af,y)  :=  RA(x,y)  +  1. 

j:=  (J+  1)  %  A/. 


Figure  2-9:  Row  addresses  to  be  incremented  when  square  to  be  accessed 

is  moved  horizontally 


The  second  method  for  continued  increments  along  the  x-dircction  assumes  the  presence  of  an 
MciV  array,  RAInc.  'Hie  row  addresses  of  the  mcmiry  chips  arc  incremented  by  this  array  to  compute 
die  new  row  addresses.  The  RAInc  array  is  then  updated  so  that  further  increments  can  be  done  in  a 
similar  manner.  This  increment  for  the  initial  placement  is  computed  as 

if  (r  =  0  %  M))  then  RAIm\cA  :=  1 
else  RAInc[c,r) :  =  0. 

Since  further  udpates  require  the  value  of  j  to  be  incremented  by  1,  die  RAInc  array  can  be  updated 


22 


by  circularly  shifting  it  in  the  y-dircction  by  a  single  step.  The  address  updating  procedure  in  this 
case  is 

CA(x+At,y) :  =  CA(x,y)  +  1; 

RA(x+  At,y)  :=  RA(x,y)  +  RAInc[c,r\, 

RAlnc[c,r]  :=  RAInc[<tM+r-l)%\f]. 


2.3.2.  Addressing  horizontal  spans 

We  shall  assume  that  when  accessing  a  horizontal  span  whose  left  corner  is  located  at  {x,y),  x  is  a 
multiple  of  At 2.  This  assumption  is  justifiable  when  this  mode  of  memory  access  is  used  only  for 
screen  refresh,  'lhe  leftmost  M  pixels  of  this  span  are  located  in  chips  (0  <  c  <  At,  ytfcAf)  and  are 
addressed  by  a  column  address  of  [x/Af)  and  a  row  address  of  ( y+x/\f)/M .  Chips  in  other  rows 
shall  be  addressed  by  a  column  address  that  is  a  sum  of  x/M  and  the  difference  between  the  chip  row 
number  and  (y&A/).  Two  row  addresses  arc  used  because  the  span  is  likely  to  cross  a  stagger 
boundary  (the  one  in  Figure  2-7  does  not).  The  address  for  a  chip  (c,r)  can  be  computed  as 

C/1  :  =  x/Af  +  (A I  +  r-{y %  At)) %  A/; 

RA:=(y  +  x/Af)/  M\ 

if  (O'  %  At)  >  r)  then  RA  :  =  RA  +  1. 

This  address  is  easily  incremented  when  the  span  at  x  :  =  x  +  A/2  is  to  be  accessed.  It  can  be  done 
as 

CA(x+At2,y):=  CA(x.y)  +  At; 

RA(x+M2,y) :  =  RA(x,y)  +  1. 

2.4.  Conclusion 

The  frame  buffer  memory  organization  is  the  key  to  achieve  high  display  performance.  The 
effective  utilization  of  the  memory  organization  depends  on  an  efficient  address  computation.  The 
efficiency  of  the  address  computation  can  be  increased  by  simple  incremental  algorithms  to  access 
parts  of  the  display  in  close  proximity  of  the  previous  access.  This  chapter  presents  the  scan-line  and 
the  square  orgnization,  both  of  which  have  simple  address  computation  algorithms,  lhe  square 
organization  is  more  suited  for  update  algorithms  but  suffers  from  an  expensive  screen  refresh 
implementation,  lhe  staggered  square  organization  allows  the  access  of  squares  for  updates  and  scan- 
line  spans  for  screen  refresh,  but  requires  expensive  address  computation  algorithms. 


FLEXIBLE  M1LM0RY  ORGANIZATIONS 


23 


Chapter  3 

Flexible  Memory  Organizations  3 


The  previous  chapter  discussed  one  memory  organization  which  provides  two  kinds  of  accesses;  it 
provides  the  ability  to  access  squares  for  the  convenience  of  update  algorithms  and  it  can  also  be  used 
to  access  horizontal  spans  for  the  raster  scanning  required  by  screen  refresh.  The  flexibility  of 
accessing  the  memory  in  the  form  desired  by  each  task  effectively  increases  the  total  memory 
bandwidth  and  the  performance  of  the  display  system.  The  screen  refresh  task  requires  accessing  the 
largest  possible  horizontal  spans.  The  claim  of  this  thesis  is  that  if  all  update  tasks  had  the  choice  of 
one  memory  organization  they  would  choose  the  square  organization  discussed  in  the  previous 
chapter. 

If  update  algorithms  can  use  different  geometries  to  access  the  display  memory,  they  can.ei.uusc 
one  which  minimizes  the  number  of  memory  cycles  used.  For  example,  if  the  memory  organization 
provides  both  horizontal  and  vertical  spans,  then  a  line  drawing  algorithm  could  use  the  horizontal 
access  for  flat  lines  and  the  vertical  access  for  the  steeper  lines.  'Flic  flexibility  can  be  extended  to 
provide  access  to  rectangles  with  all  different  sizes  and  shapes,  in  which  case  the  line  drawing 
algorithm  would  merely  choose  a  size  depending  upon  the  length  of  the  line  and  a  shape  depending 
upon  the  slope  of  the  line  and  update  the  line  in  one  memory  access.  The  maximum  size  allowed 
determines  the  maximum  bandwidth  possible  and  also  the  cost  of  the  system.  Allowing  for  all 
different  sizes  requires  parallel  access  to  all  the  pixels  of  the  display  and  hence  cannot  make  use  of 
any  random  access  memory  technology.  If  the  maximum  size  were  restricted  to  N,  and  the  shapes 
were  restricted  to  be  rectangular,  then  the  most  flexibility  is  provided  by  an  organization  which 

allows  the  access  of  all  rectangles  with  sizes  IxA,  2x[A'/2j,  3x[A/3j . [N/  2jx2,  and  Ax  1.  Such 

an  organization  is  useful  for  the  following  reasons: 


3 

This  chapter  is  not  directly  related  to  the  main  theme  of  this  thesis  and  may  be  omitted  by  the  casual  reader.  Dan  llocy 
provided  substantial  assistance  to  obtain  the  results  in  Section  3.3. 


24 


1.  The  update  algorithms  will  perform  better  when  they  attempt  to  update  the  screen  using 
the  minimum  number  of  memory  cycles.  For  example,  a  line-drawing  algorithm  can  now 
select  a  rectangle  depending  upon  the  slope  of  the  line  (Figure  3-1). 


Figure  3-1 :  Line-drawing  algorithms  can  choose  rectangular  shapes 
depending  upon  die  slope  of  lines. 

2.  The  same  frame  buffer  memory  can  be  used  to  configure  different  types  of  displays  with 
different  resolutions  and  gray-levels.  If  we  want  a  1024x1024  bit-map  display  which 
allows  the  access  of  8x8  squares  or  64x1  spans,  or  a  512x512  4  bit/pixcl  display  which 
allows  the  access  of  4x4  squares  or  16x1  spans.  If  die  gray-scale  pixel  maps  onto  die  bit¬ 
map  display  as  a  2x2  square,  then  the  memory  organization  has  to  provide  access  to  8x8, 
64x1,  and  32x2  rectangles. 

3.  We  can  drive  different  output  devices  from  the  same  frame  buffer.  If  the  frame  buffer 
memory  provides  both  die  horizontal  and  the  vertical  accesses  dicn  we  can 
simultaneously  drive  a  CRT  which  scans  horizontally  and  a  hard  copy  printer  which  may 
scan  vertically. 


The  organizations  discussed  in  the  previous  chapter  had  the  property  that  all  the  memory  chips  in 
every  organization  could  provide  useful  data  in  each  memory  access.  This  implied  that  the  area  of 
display  accessed  was  equal  to  the  number  of  memory  chips  in  die  system,  with  a  memory  chip 
defined  to  be  a  piece  of  random  access  memory  that  can  access  one  pixel  in  each  memory  cycle.  The 
scan-line  organization  that  allows  die  access  of  N  pixels  along  a  scan  line  uses  .V  memory  chips:  the 
square  organization  that  accesses  MxM  squares  uses  3/2  memory  chips;  and  die  staggered 
organization  that  accesses  both  MxM  squares  and  M\  1  spans  and  also  uses  ,l/2  memory  chips. 


FLEXIBLE  MEMORY  ORGANIZATIONS 


25 


When  wc  increase  tlic  flexibility  of  accessing  the  display  memory  system,  we  will  see  that  wc  need 
more  memory  chips  than  the  maximum  size  of  the  display  area  that  can  be  accessed  in  each  memory 
cycle.  Hence  some  of  the  memory  chips  will  not  access  useful  data  in  each  memory  access,  although 
the  set  of  chips  in  use  is  different  for  different  accesses.  An  attempt  should  be  made  to  use  die 
minimum  number  of  memory  chips  for  the  organization  used  although  implementation  simplicities 
might  favor  the  choice  of  a  more  expensive  scheme. 

Each  memory  organization  can  be  defined  by  die  mapping  of  each  display  pixel  to  the  memory 
chip  number  and  represented  by  a  function  ChipNumbei{x,y ),  where  x  and  y  arc  die  coordinates  of 
die  pixel  on  die  display.  Given  die  memory  organization  function,  two  functions  define  the 
computations  diat  have  to  be  performed  for  each  memory  chip  during  every  memory  access.  'Hie  first 
of  these  two  functions  computes  the  address  for  each  memory  chip  from  die  specifications  of  the 
rectangle  to  be  accessed.  The  second  function  computes  the  mask  when  only  a  subset  of  the  rectangle 
to  be  accessed  is  to  be  written  into.  The  addressing  function  can  be  represented  as 
Address(x,y.d.x,dy,c)  which  computes  the  address  for  the  chip  numbered  c  when  the  rectangle  with  its 
top  left  corner  at  (x,>)  of  size  (dx,dy)  is  being  accessed.  The  address  function  is  often  broken  up  into 
two  parts  representing  the  row  and  column  addresses  of  die  memory  chips  and  can  be  specified  as 
Row  Address  x,y,dx.dy,c)  and  ColAddress(x,y,dx,dy,c).  The  masking  function  is  represented  as 
Mask(x,y,dxMyjndx,mdy,c)  and  computes  the  mask  for  the  chip  numbered  c  when  the  rectangle 
specified  by  x,y.dx,  and  dy  is  being  accessed  and  only  the  top  left  subrcctanglc  of  size  (mdx.mdy)  is  to. 
be  written  into.  This  function  will  return  a  zero  for  the  chips  that  have  to  be  enabled  and  one  for  the 
others.  The  following  table  contains  these  functions  for  the  scan-line,  square,  and  the  staggered 
organizations.  Hie  comparisons  return  one  if  diey  succeed  and  zero  otherwise;  they  represent  the 
address  increment  required  by  die  memory  chips  that  cross  word  boundaries. 


Scan-line  organization  for  a  display  sized  1024x1024. 

Memory  chips  arc  numbered  0  <  c  <  15. 

/:  =  x<3:0> 

ChipNwnbei(x,y) :  =  / 

RowAddress{x,y,\6Xc)  :=  ><1:0>-  x<9:4>  +■  (c  <  i) 
ColAddress(x.y,\6Xc) :  =  )<9:2> 

Mask(x,y,\6,l,mdx,l,c )  :=  ((16  +  c  —  f)  %  16)  >  mdx. 

Symmetric  square  organization  for  a  display  sized  1024x1024. 
Memory  chips  arc  numbered  (c,r)  where  (0  <  c  £  7)  and  (0  <  r  ^  7). 
i:=  x%8;j:=  y%8 
ChipNumbei{x,y)  :  =  ( i,j ) 

ColAddress(x,y,8,8,c,r) :  =  jc/8  +  (c  <  i) 
RowAddress(x,y,$,8,c,r)  :  =  y/8  +  (r<j) 
Mask(x,y,i,i.mdx,mdy,cj ') :  =  (((8  +  c  -  0  %  8)  >  mdx) 

V(((8  +  r-  j)%Z)>mdy). 

Staggered  square  organization  for  a  display  sized  1024x1024. 
Memory  chips  arc  numbered  (c,r)  where  (0  <  c  <  7)  and  (0  <  r  <  7). 
i  :=  x  %  8; :  =  (y  +  x/8)  %  8 
ChipNumbei(x,y)  :=  (i,y) 

ColAddrcss(x,y.8.8,c,r)  :  =  jc/8  +  (c  <  0 
RowAddres^x,y,8,8.c,r)  :=  (y  +  or/8)/8  +  (r  <  O'  +  (c  <  /))) 
Mask(x,y$$.mdx,mdyx,r) :  =  (((8  +  c  -  0  %  8)  >  mdx) 

V  (((8  +  r  —  (/  +  (c  >  i»)  %  8)  <  indy). 

When  x  is  multiple  of  64 

Col  A  ddresd.  x.yM.  1 ,  c,  r)  :=  x/8  +  (8  +  r-  (y%  8))%8 
RowAddress{x,yMXc,r) :  =  (>  +  x/8)/8  +  (O'  %  8)  >  r). 

Tabic  3-1 


The  rest  of  the  chapter  discusses  some  more  flexible  organizations,  and  describes  their 
characteristic  functions.  The  organizations  restrict  themselves  to  accessing  rectangular  geometries. 


3.1.  Row-Column  organization 


The  first  unconventional  organization  is  one  that  allows  parallel  access  of  any  adjacent  set  of  N 
pixels  along  rows  or  columns.  This  organization  has  potential  application  in  displays  which  drive  two 
different  output  devices,  one  of  which  is  scanned  along  rows  and  die  other  scanned  along  columns. 
Update  algorithms  behave  symmetrically  along  cither  axes  although  diagonal  update  can  be 
performed  only  one  pixel  at  a  time. 


This  organization  requires  that  all  pixels  in  every  adjacent  set  be  located  in  different  memory 
chips.  Figure  3-2  shows  one  mapping  from  pixel  location  to  memory  chip  number  for  N  =  4.  'Ihis 


FLEXIBLE  MEMORY  ORGANIZATIONS 


0 

1 

2 

3 

0 

1 

2 

(0.0) 

(0.0) 

(0.0) 

(0.0) 

(2.0) 

(2.0) 

(2.0) 

1 

2 

3 

0 

1 

2 

3 

(0.1) 

(0.1) 

(0.1) 

(0.1) 

(2.1) 

(2.1) 

(2.1) 

2 

3 

0 

1 

2 

3 

0 

(1.0) 

(1.0) 

(1,0) 

(1.0) 

(3.0) 

(3.0) 

(3.0) 

3 

0 

1 

2 

3 

0 

1 

(i.i) 

(1.1) 

(1.1) 

(1.1) 

(3.1) 

(3,1) 

(3.1) 

0 

1 

2 

3 

0 

1 

2 

(0.2) 

(0.2) 

(0.2) 

(0.2) 

(2.2) 

(2.2) 

(2.2) 

1 

2 

3 

0 

1 

2 

3 

(0.3) 

(0,3) 

(0.3) 

(0,3) 

(2.3) 

(2.3) 

(2.3) 

2 

3 

0 

1 

2 

3 

0 

(1.2) 

(1.2) 

(1.2) 

(1.2) 

(3,2) 

(3.2) 

(3.2) 

3 

0 

1 

2 

3 

0 

1 

(1.3) 

(1.3) 

(1.3) 

(1.3) 

(3.3) 

(3.3) 

(3.3) 

Figure  3*2:  Row-Column  organization. 

mapping  staggers  the  chip  numbers  in  consecutive  rows  but  still  requires  only  N  memory  chips.  For 
this  organization 

ChipNumbei{x,y)  =  (x4-y)  %  4 

Figure  3-2  also  shows  in  parentheses  the  row  and  column  addresses  required  to  address  the 
memory  chips.  'Ihc  addressing  mechanism  is  symmetric  over  4x4  squares  but  asymmetric  within  each 
4x4  square.  It  addresses  the  pixel  with  a  column  address  of  (2(x/4)  +  (j%4)/2)  and  a  row 
address  of  (20'/4)  +  y%2).  This  addressing  scheme  is  symmetric  in  a  way  because  the  column 
addresses  use  only  higher  order  bits  of  x  and  row  addresses  use  only  higher  order  bits  of  y  but  both  of 
tlicm  use  one  lower  order  bit  of  j’cach.  The  addressing  and  masking  functions  are 

i:  =  (x  +  y%4)  %  4;  j :  =  (y  +  x%4)  %  4 


ColAddrcss{x.y,4A,c,r)  :=  2(.t/4)  +  (f!<A)/7  -I-  2(c<  /) 
RowAii(Ircss(x,y.4,\,c,r)  :  =  2(y/4)  +  y%2 
Mask{x,y,4,\ ,mdx,\,c,r)  :=  ((4  +  c  -  r)%4)  >  mdx 


Col  A  ddress(  x  j\  1 ,4 ,  c,  r) :  =  2(jc/4)  +  (y%4)/2 
RowAddres#,x,yAA.c.r)  :  =  2(y/4)  +  fkl  +  2(c  <  j) 

Mask{xj\\AX»uly,c,r)  :=  ((4  -f  c  -  jfkA)  >  mdy. 

3.2.  Row-Column-Square  organization 

The  previous  chapter  presented  a  staggered  organization  that  could  be  used  to  access  both  squares 
and  horizontal  spans;  the  same  technique  can  be  used  to  access  both  squares  and  vertical  spans.  The 
previous  section  presented  an  organization  that  can  be  used  to  access  both  horizontal  and  vertical 
spans.  All  these  organizations  use  the  same  number  of  memory  chips  as  the  area  of  the  rectangles 
being  accessed.  Rut  when  we  try  to  access  squares,  horizontal  and  vertical  spans  of  area  A/2  we  need 
A/2+l  memory  chips.  Figure  3-3  shows  one  possible  memory  mapping  for  A/  =  4  that  is  used  to 
access  4x4  squares  and  both  16x1  and  1x16  spans.  This  memory  mapping  relics  on  the  fact  that  M  is 
relatively  prime  to  A/2+ 1.  The  ChipNumber  function  increments  by  1  along  the  x-dircction  and  by  M 
along  the  y-dircction.  For  A/=4,  this  function  is 

ChipNumbei{x,y)  :=  (x+4y)%Yl. 

Assume  that  the  pixel  at  (*,;-)  is  addressed  by  (x/\1)  y.  There  is  no  preferred  way  for  splitting  this 
into  row  and  column  addresses  and  hence  we  shall  leave  the  address  undivided. 

Horizontal  spans  can  be  addressed  using  the  following  (unctions  - 
i:=.x%  17 

Addrcss(x,y,n,l.c)  '.=  (x/17  +  (c  <  t))-y 
Mask(x,y,  17, \,mdx,\,c)  :=  (17  +  c  —  i)%Yl  >  mdx. 

Accessing  vertical  spans  can  be  eased  by  the  presence  of  a  two  dimensional  vertydis{a\b\  table 
which  is  indexed  by  chip  numbers  in  both  the  dimensions.  If  the  value  of  a  is  the  chipnumbcr  of  the 
top  of  a  span  then  the  table  will  give  the  vertical  distance  to  the  next  lower  pixel  with  a  chipnumbcr 
equal  to  b.  So  for  example,  vmj<fc/[0][8]  =  2,  and  vertydis\  16][2]  =  5.  Given  die  presence  of  such  a 
precomputed  table  the  addressing  functions  for  vertical  spans  arc 

d :  =  CltipNumbei{x,y) 

Addross(x,y,l,n ,c)  :=  (x/\l)-(y  +  vcrly(lisi[i/\[c]) 

Mask(x,y,l,Yl \\,mdy,c) :  =  vcrtydisi[d\[c]  >  mdy. 

Accessing  squares  can  be  done  in  a  manner  similar  to  the  one  used  to  access  vertical  spans  with  the 
difference  of  requiring  two  precomputed  tables  both  which  arc  similar  to  the  vertydisi  table.  Ihc  first 
one  is  squarcxdist  which  gives  the  x-disiancc  from  the  corner  memory  chip  which  is  the  first 


FLEXIBLE  MEMORY  ORGANIZATIONS 


o 


Figure  >3:  Row-Column-Square  organization 

parameter  to  the  chip  which  is  the  second  parameter.  The  second  table  is  squareydist  whose  result  in 
the  y-distance  in  a  manner  similar  to  the  first  table.  As  an  example  squarexdis^^)]  =  1, 
squareydisfifi^)]  =  2,  s<7uarex<fc/[14][3]  =  2,  and  squareydis\\A\y\  -  1.  The  addressing  functions  for 
such  a  formulation  are 

d:  =  ChipNumbet{x,y) 

Addresdx,yAAx)  :  =  ((*  +  squarexdisl[d\[c])/n)  •  (>’  +  squarcydisl[d\[c]) 
Mask(x,yAA.i»dx,nidy,c ) :  =  (squarcxdiii[d\[c]  >  mdx) 
v  (squareydisi[d\[c}  >  nidy). 

This  technique  of  using  precomputed  tables  can  be  extended  to  general  rectangular  accesses  by  using 
two  precomputed  tables  for  each  geometry  to  be  accessed. 


30 


3.3.  General  rectangular  organization 

We  arc  nosv  going  to  extend  the  memory  organization  to  allow  the  access  of  every  rectangle  on  the 
display  with  an  area  of  less  than  or  equal  to  N  in  one  memory  access.  The  key  question  is  to 
determine  the  minimum  number  of  memory  chips  that  arc  required  to  implement  any  such 
organization.  The  most  obvious  implementation  puts  pixels  spaced  N  apart  in  both  directions  in  the 
same  memory  chips  (Figure  3-4).  The  area  enclosed  by  such  pixels  is  N2  and  since  all  these  pixels 
have  to  be  located  in  different  memory  chips  the  number  of  memory  chips  required  is  N2.  An 
improvement  to  this  implementation  is  shown  in  Figure  3-5.  The  repetition  pattern  in  this 
implementation  is  trapezoidal  instead  of  being  square  and  requires  only  Ni/2  memory  chips. 


r  igurc  3-4:  Generalized  rectangles  using  /V  chips 


In  order  to  improve  the  chip  count  we  look  at  the  bounds  of  one  pixel,  in  the  sense  that  once  this 
pixel  has  been  placed  in  any  memory  chip,  all  pixels  enclosed  within  the  boundary  cannot  be  in  the 
same  memory  chip  (Figure  3-6).  l:or  simplicity  we  assume  dial  die  pixel  whose  bounds  we  arc 
considering  is  located  at  (0,0).  'Ihc  curves  on  the  boundary  arc  hyperbolas  because  the  size  of  the 
rectangles  bounds  the  outlines  to  (.vy)  =  N.  Suppose  we  can  place  one  pixel  on  die  boundary  located 
at  (<i,b)  in  the  same  memory  chip  as  the  pixel  in  the  center.  Symmetry  would  dictate  diat  we  can  also 
place  the  pixels  at  ( —  b,a),  (-a.  —  b).  and  (b.-a)  in  the  same  memory  chip.  To  access  all  rectangles  of 
size  /V.  no  pixel  should  be  within  the  boundary  of  another  pixel  in  diis  scl  'ihis  constraint  can  be 
stated  as  : 


Figure  3-6:  Bounds  of  a  pixel  when  all  rectangles  arc  allowed  to  be  accessed. 
No  pixels  within  the  boundary1  can  be  in  the  same  chip  as  the  central  pixel. 


In  order  to  complete  this  discussion  we  have  to  prove  that  when  the  proposed  pattern  is  repeated, 
no  pixels  arc  within  the  bounds  of  any  other  pixel.  To  prove  this,  we  shall  consider  any  arbitrary  pixel 
in  the  pattern  and  show  that  it  does  not  lie  within  the  bounds  of  the  pixel  at  (0,0).  The  coordinates  of 
all  pixels  in  a  pattern  arc  formed  by  a  certain  number  of  moves  in  the  ( a,N/a )  direction  and  a  certain 
number  of  moves  in  the  (-  N/a.a )  direction.  If  the  number  of  moves  in  the  first  direction  is  p  and 
the  number  of  moves  in  the  second  direction  is  q  then  the  coordinates  of  such  a  point  arc 
(pa-qN/a.pN/a+  qa ).  In  order  that  this  point  lie  outside  the  bounds  of  the  origin  we  must  have 

( pa-qN/a)(pi\/a+qa )*  N 
or 

p*N  -  q*N  +  Npqic?-  1/a2)  £  N 
Since  a2  -  1/a2  =  1,  we  have 


Fl.IiXIBLK  MliMORY  ORGANIZATIONS 


Figure  3-7:  Pixels  contained  in  one  chip  in  the  general  rectangular  organization 
Nip 2  -  (?  +  pq)  >  N 

which  is  true  because  p2  -  q1  +  pq  =  0  has  no  integer  solutions. 


The  above  formulation  provides  a  non-integer  bound  on  the  number  of  memory  chips  required. 
Hie  problem  is,  however,  an  integer  problem  and  the  values  used  in  die  mapping  have  to  be  integers. 
Let  us  consider  the  repetition  pattern  in  Figure  3-7  assuming  that  a  and  b  arc  integers.  In  order  for 
{a,b)  to  be  outside  the  bounds  of  (0,0)  we  must  have  (a+l)(6+ 1)  >  N  (ab  could  be  less  than  N 
because  there  might  not  be  a  closer  way  to  factor  A").  Also  in  order  for  (a.b)  to  be  outside  ( b,-a )  we 
must  have  (a- b+])(a+b+ 1)  >  N.  'Ihc  smallest  values  of  a  and  b  to  satisfy  this  equality  can  be  used 
in  the  organization.  Figure  3-9  plots  the  number  of  memory  chips  required  for  different  rectangular 
sizes.  The  straight  line  represents  V5  N.  The  figure  confirms  our  Vs  N  approximation  for  die 
number  of  chips  required  to  implement  a  general  rectangular  organization. 

For  N  =  16,  a  =  5  and  b  =  2  arc  the  smallest  integers  to  satisfy  the  two  inequalities.  This  memory 
organization  hence  requires  29  ( =  (?  +  !?)  memory  chips  to  enable  the  access  of  all  rectangles  with  an 
area  less  than  or  equal  to  16.  The  pixel  located  at  (5,2)  will  be  contained  in  the  same  memory  chip  as 


34 


the  pixel  located  at  (0.0).  In  Figure  3-7  we  had  observed  that  all  pixels  spanned  by  moves  in  the  (a,b) 
and  (~b,u)  directions  are  contained  in  the  same  memory  chip.  Hence  all  pixels  located  at 
( 5p-2q,2p+5q ),  for  all  integer  values  of  p  and  q,  are  placed  in  die  same  memory  chip  as  die  pixel 
located  at  (0,0).  The  next  pixel  in  die  first  row  placed  in  the  same  memory  chip  as  the  pixel  at  (0,0)  is 
located  at  (29,0),  when  p  =  5  and  q  =  -2.  We  can  hence  place  all  pixels  from  (0,0)  to  (28,0)  in  the  29 
different  memory  chips  resulting  in  a  memory  mapping  shown  in  Figure  3-8.  The  closest  pixel  in  die 
second  row  which  is  in  die  same  memory  chip  as  the  pixel  at  (0,0)  is  located  at  (17,1)  corresponding  to 
when  p  =  3  and  q  =  - 1.  To  preserve  a  unit  increment  in  chip  numbers  in  all  rows  we  can  start  by 
placing  die  pixel  at  (0,1)  in  chip  number  12  which  will  result  in  the  pixels  at  (17,1)  being  in  chip 
number  0.  The  ChipNumber  function  for  this  organization  can  hence  be  stated  as 

Chip,\umbei(x,y) :  =  (12 y  +  x)%  29. 


nr 

1 

2 

3 

4 

5 

6 

7 

8 

9  10  11  12  13  14  15  16 

17 

18 

19 

28 

o 

J. 

12  13  14  15  16  17 

18  19  20  21 

CM 

CM 

23  24:25  26  27 

00 

CM 

1 

■ 

■ 

1 

24 .25  26  27 ,28 

0 

■ 

■ 

7 

8 

9  |10|11 

12 

L_ 

■ 

■ 

■ 

■ 

■ 

19  20  21  22  23 

i 

n 

1 

■ 

■ 

B 

3[  4 

5 

6 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

wm 

SB 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

r 

26  27  28 

0 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

138  10;11 

12 

! 

■ 

21  22  23 

24 

1 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

4 

5 

6 

H 

■ 

■ 

■ 

■ 

■ 

■ 

r 

L_ 

■ 

46~ 

17 

18 

19 

_ 

L 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

u 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

l 

i 

i 

1 

i 

1 

1 

i 

1 

1 

1 

1 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

i 

r 

17 

IS] 

19 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

□ 

0 

_ 

■ 

m 

■ 

■ 

_ 

_ 

L 

_ 

-i 

_ 

■ 

■ 

Figure  3-8:  General  rectangular  memory  organization  for  N  =  16 


Address  and  mask  computations  arc  performed  in  a  manner  similar  to  the  Row-Column-Square 
organization.  A  pair  of  tables  is  precomputed  for  each  rectangular  shape  accessed,  l-acli  pair  of  tables 
is  indexed  by  two  chip  numbers  and  identified  as  xdisidxdy  and  ydisidxdy  where  tlx  x  dy  is  the  size  of 


the  rectangle  being  accessed,  and  the  contents  are  similar  to  the  contents  of  the  squarcxdist  and 
squareydist  tables  of  the  Row-Column -Square  organization.  The  value  of  xdistdxdy  [<  ][/]  will  be  the  x- 
distance  between  two  pixels  in  a  dx  x  dy  rectangle  when  the  top  corner  pixel  of  the  rectangle  is  in 
memory  chip  number  e  and  the  second  pixel  is  in  chip  number  f  ydistdxdy[e§j\  results  in  the  y- 
distance  with  the  same  constraints.  Since  there  are  approximately  2 V~N  rectangular  shapes  being 
accessed,  there  will  be  2  V~N  pairs  of  such  tables  each  with  approximately  5 N~  entries  (assuming  that 
V5  N  memory  chips  arc  required  to  implement  the  memory  system).  Using  these  precomputed 
tables,  die  addressing  and  masking  require  the  following  functions.  C  is  the  number  of  chips  actually 
used  in  the  memory  system. 

d:~  ChipNumber(x,y) 

Addrcss{x,y,dx,dy,c)  :=  (x  +  xdistdxd)[d][c])  ■  (y  +  ydistdxd)[d][c])/C 

Mask(x,y,dx,dy,mdx,mdy.c) :  =  (xdisidxd\{d\[c]  >  mdx) 
v  (ydistdxd)[d\[c\  >  indy) 

3.4.  Conclusion 

The  flexible  frame  buffer  memory  organizations  presented  in  this  chapter  can  be  used  to  improve 
the  performance  of  displays.  As  can  be  expected,  the  flexibility  results  in  an  increased  complexity 
both  in  the  hardware  implementation  of  the  memory  organization  and  die  application  software  which 
attempts  to  use  this  flexibility.  'Hie  increase  in  hardware  complexity  results  from  die  fact  that  there 
are  no  easy  implementations  of  die  addressing  computation,  nor  any  obvious  incremental  algorithms. 
In  Chapter  4,  we  w  ill  see  that  a  large  class  of  applications  copy  data  read  from  one  part  of  the  display 
to  another.  This  requires  die  potential  movement  of  data  from  one  memory  chip  to  another.  The 
memory  organizations  presented  in  the  previous  chapter  have  simple  implementations  for  diis  data 
movement;  they  will  be  discussed  in  Chapter  4.  The  flexible  memory  organizations  of  this  chapter 
result  in  horrible  data  movement  patterns  which  need  expensive  implementations. 

'Hie  flexible  memory  organizations  arc  a  high  performance,  high  cost  point  in  die  cost- 
performance  curve  for  display  designs. 


B1TBLT 


37 


Chapter  4 
BITBLT 


A  large  class  of  display  applications  require  extremely  simple  transformations  on  pixels  already 
stored  in  the  frame  buffer.  These  transformation  techniques  can  be  used  to  save  the  often  inefficient 
task  of  recomputing  the  image.  The  most  common  example  is  the  task  of  displaying  characters.  If  a 
character  is  already  in  the  frame  buffer,  then  the  task  of  displaying  the  same  character  at  a  different 
location  can  be  performed  by  merely  copying  the  description  from  its  previous  location.  The 
usefulness  of  such  a  mechanism  increases  even  more  when  the  frame  buffer  is  larger  than  the 
displayed  area  and  allows  the  storage  of  precomputed  images  that  do  not  get  displayed.  All  characters 
can  now  be  displayed  by  moving  individual  character  descriptions  from  fonts  stored  in  the  invisible 
part  of  the  frame  buffer. 

The  display  processor  associated  with  the  frame  buffer  can  be  programmed  to  provide  all  such 
required  transformations.  Tire  host  computer  chooses  the  transformation  desired  and  sends  its 
parameters  to  the  display  processor.  In  the  ease  of  characters,  the  code  of  the  character  and  its 
destination  is  sufficient  information  for  lire  display  processor  to  move  the  character  from  the  font 
definitions  to  the  desired  location. 


The  type  of  tran formations  required  is  governed  by  the  applications  desired.  We  shall  present  a 
sample  list  of  tasks  and  the  type  of  transformations  they  require. 

1.  Characters  from  fonts  -  If  fonts  can  be  stored  within  the  frame  buffer,  then  placing 
character  strings  is  a  simply  a  matter  of  moving  individual  characters  from  the  fonts  to  the 
desired  locations.  Standard  ASCII  terminals  implement  a  similar  mechanism,  with  the 
difference  that  they  do  not  actually  move  the  character  descriptions  but  simply  read  die 
fonts  while  generating  the  video  output  for  the  CR  T  display.  The  frame  buffer  approach 
is  more  general  because  it  allows  die  use  of  multiple  fonts  with  varying  sizes,  scripts,  and 
even  alphabets.  This  application  requires  the  ability  to  copy  a  rectangular  region  from  one 
part  of  the  display  to  another.  The  copy  operation  can  also  be  used  for  other  frequently 
used  operations  like  clearing  the  display  or  displaying  background  or  boundary  patterns, 
which  can  be  done  by  moving  the  same  fixed  pattern  to  different  locations  in  order  to 
cover  the  desired  area. 


2.  Scrolling.  Windowing  -  Scrolling  images  stored  within  frame  buffers  requires  moving  the 
image  description  in  memory.  The  movement  is  necessitated  by  the  fact  that  an  arbitrary 
rectangular  window  might  have  to  be  scrolled  and  the  scrolling  can  be  in  any  direction 
[Sproull  79].  Simple  scrolling  operations  which  scroll  the  whole  screen  only  in  one 
direction  can  make  use  of  hardware  tricks  like  starting  the  video  scan  at  an  arbitrary  scan 
line.  Such  tricks  cannot  be  used  in  the  general  ease,  when  the  only  solution  is  to  move  the 
contents  of  die  frame  buffer  memory.  Scrolling  requires  the  rectangular  copy  operation  to 
move  part  of  the  scrolled  image  to  its  new  location  (Figure  4-1).  The  strip  emptied  is  now 
cleared  by  a  clear  operation  or  recreated  by  a  combination  of  other  operations.  The  same 
sequence  of  operations  can  be  used  to  scroll  in  any  of  the  four  directions. 


Figure  4-1:  Scrolling 

3.  Temporary  images  -  Interactive  graphics  makes  extensive  use  of  temporary  images  to 
convey  momentary  pieces  of  information  like  command  menus,  error  messages,  or  status 
reports.  These  images  arc  often  put  on  top  of  other  existing  images  which  sometime  also 
have  to  be  displayed  and  almost  certainly  restored  after  the  temporary  image  is  removed. 
The  temporary  images  can  also  be  stored  in  the  invisible  portion  of  tire  frame  buffer  with 
the  fonts  and  simply  copied  into  the  visible  area  when  desired.  A  copy  deletes  the 
contents  of  the  existing  image  which  should  be  saved  before  performing  the  copy  and 
restored  when  the  temporary  image  can  be  removed.  In  order  to  achieve  an  overlay  of  the 
two  images,  the  temporary  image  can  either  be  OKed  or  XORed  into  the  existing  image. 
These  binary  operations  can  be  used  only  to  combine  bit-map  images.  The  OR  operation 
simply  overlays  one  image  on  top  of  the  other  and  corresponds  to  the  logical  OR  when 
the  two  images  have  the  same  background  color  and  die  logical  OR  of  one  with  the 
complement  of  the  other  in  the  ease  when  images  have  different  backgrounds,  This  OR 
operation  destroys  the  contents  of  the  old  image  which  hence  has  to  be  saved  and  restored 
using  the  same  mechanism  as  in  die  copy  operation.  Hie  XOR  operation  preserves 
information  because  applying  this  operation  twice  will  restore  die  contents  of  the  original 
image  (Figure  4-2). 

4.  Cursors  -  Cursors  can  be  treated  in  die  same  manner  as  temporary  images  with  the  only 
difference  being  that  cursors  have  to  be  moved  extremely  fast,  usually  to  keep  pace  with 
some  sort  of  pointing  device.  The  speed  requirement  is  the  reason  that  most  display 
designs  provide  special  hardware  which  performs  the  cursor  task,  usually  by  overlaying 
the  cursor  pattern  with  die  image  during  the  video  generation  for  the  screen.  However,  if 
die  graphics  processor  can  perform  the  combination  operation  (including  saving  and 
restoring  the  original  image  if  die  information  preserving  transformation  is  not  being 


•  •  •  •  •  • 

•  •  •  •  •  • 


•  •  •  •  • 

•  •  •  •  • 

•  •  •  •  •  • 

•  •••••  ••• 


«  • 
•  • 


OR 


XOR 


Figure  4-2:  OR  and  XOR  operations 

used)  fast  enough,  then  the  need  for  the  special  hardware  can  be  eliminated.  This  also 
allows  the  cursor  to  have  arbitrary  shape  and  size,  a  feature  not  present  in  most  special 
hardware  implementations. 

5.  Highlighting  -  Instant  feedback  requires  highlighting  selected  parts  of  the  image  and  is 
usually  provided  by  changes  in  intensity,  background  color,  or  by  simply  flashing  the 
selected  part.  Intensity  change  can  be  performed  in  many  different  ways  such  as  changing 
the  grayscale  intensity.  Flashing  can  be  performed  by  alternately  clearing  the  region  and 
restoring  the  original  contents.  Background  change  requires  the  invert  operation  which 
complements  the  bit-map  of  the  selected  image. 

6.  Browsing  through  large  images  -  Displays  are  often  used  to  browse  through  large  images 
which  cannot  be  frilly  viewed  in  one  screenful!.  Two  examples  of  this  arc  viewing  the 
chcckplot  of  a  large  VLSI  chip,  or  browsing  through  a  large  digitized  terrain  map.  The 
extra  space  in  the  invisible  part  of  the  display  can  be  used  to  store  adjoining  areas  of  the 
image  that  is  being  displayed  and  hence  speed  up  small  movements  of  the  image.  The 
standard  scrolling  operations  can  be  used  in  the  case  of  moving  the  image  within  the 
display  memory.  However,  even  the  whole  display  memory  may  not  be  sufficient  to  store 
the  entire  image  which  has  to  be  then  stored  in  host  computer  memory  or  worse  still  in 
secondary  memory.  In  this  case  die  display  and  host  processor  have  to  coordinate  in  a  fast 
transfer  of  image  data  to  update  the  display  memory.  This  operation  shall  be  called 
externa!  transfer  and  should  be  (if  possible)  as  fast  as  the  internal  copy  operation. 

7.  Graphics  -  The  next  few  chapters  of  this  diesis  discuss  die  generation  of  graphical  images 
through  the  use  of  precomputed  primitives.  Lines  can  be  generated  by  joining  together 
small  strokes  and  filled  regions  by  tiling  together  small  patches.  All  algorithms  dial 
generate  graphical  images  in  such  a  manner  tend  to  produce  better  output  when  a  larger 
set  of  primitives  is  used  although  diis  set  can  be  greatly  reduced  if  these  building  blocks 
can  be  mirrored  and/or  transposed  before  being  put  into  place.  Once  again  the  invisible 
part  of  the  frame  buffer  can  be  used  to  stoic  these  primitives  and  they  can  be  moved  to 
the  appropriate  part  of  the  display  to  form  the  image.  The  key  issue  in  diesc  applications 
is  the  combination  function  to  be  used  when  combining  the  stroke  or  the  patch  with  die 
existing  image.  The  combination  function  depends  upon  the  application  and  is  much 


40 

more  complicated  for  gray-scale  graphics  than  for  bit-map  graphics.  The  most  popular 
bit-map  function  used  is  the  OR  operation. 

4.1 .  BitBIt  operations 

All  the  operations  discussed  so  far  have  one  common  theme.  They  take  two  regions  in  different 
parts  of  die  display  memory,  combine  them  using  some  sort  of  combination  function,  and  write  the 
result  into  one  of  the  two  parts.  If  the  two  parts  are  identified  as  die  source  and  destination,  then  the 
operation  can  be  defined  as 

destination  :  =  source  op  destination 

We  shall  restrict  both  source  and  destination  areas  to  be  rectangular  and  of  the  same  size.  Such  an 
operation  has  been  historically  known  as  BitBIt  (the  name  originates  from  the  Block  Transfer 
instruction  of  the  PDP-10),  and  can  be  proccdurally  written  as 

BiiBlt(Srcrect,Dstrcct,Op), 

where  Srcrect  and  Dstrcct  specify  the  source  and  destination  rectangular  regions,  and  Op  specifies  the 
combination  function.  The  Srcrect  and  Dstrcct  can  overlap,  in  which  case  a  valid  order  has  to  be 
chosen  for  die  execution  of  the  operation  in  order  to  preserve  the  correctness  of  the  operation.  As  an 
example,  if  the  top  left  comer  of  the  Srcrect  is  inside  die  Dstrcct,  then  a  valid  order  for  the  operation 
is  to  start  at  the  top  right  corner  and  proceed  to  die  left  and  downwards,  flic  four  different 
possibilities  are  shown  in  Figure  4-3. 


Figure  4-3:  Overlapping  BitBIt  rectangles 


ITic  mirroring  and  transposition  transformations  provide  a  more  general  set  of  transformations 
which  will  dicn  include  rotation  by  a  multiple  of  90  degrees  (Figure  4-4).  If  we  allow  die  source 


BITBLT 


41 


rectangle  to  undergo  a  combination  of  the  x-mirror,  y-mirror.  and  transposition  transformations  then 
we  cannot  allow  the  source  and  destination  rectangles  to  overlap  because  there  docs  not  exist  a  valid 
sequence  which  will  guarantee  correct  results  in  all  circumstances. 


Original 


X- Mirror 


90  degree  rotation 
(Y-Mirror  •>  Transpose) 


180  degree  rotation 
(X-Mirror  ->  Y-Mirror) 


Transpose 


270  degree  rotation 
(X-Mirror  ->  Tranpose) 


Y- Mirror 


Figure  4-4:  Rotation  using  mirroring  and  transposition  as  basis 


Tables  4-1  and  4-2  tabulate  some  Bit  lilt  operations  and  their  possible  applications.  Table  4-1 
contains  boolean  operations  useful  for  bit-map  displays  or  single-bit  planes  of  grayscale  displays  and 
Table  4-2  contains  some  of  its  gray-scale  generalizations.  The  Min  and  Max  functions  are  die  gray¬ 
scale  equivalents  of  the  boolean  AND  and  OR  operations.  If  1  is  assumed  to  be  the  maximum  gray¬ 
scale  intensity,  then  (1  -  intensity)  is  the  equivalent  of  the  logical  inversion  of  a  boolean  intensity.  The 
sum  function  is  equivalent  to  die  boolean  XOR,  because  both  the  operations  preserve  the 
information  content  of  die  original  image.  Applying  die  XOR  operation  twice  restores  the  original 
image,  and  similarly  the  difference  operation  can  restore  the  image  before  the  sum  operation. 


One  possible  generalization  to  allow  more  flexible  geometries  would  be  to  allow  trapezoidal 
regions  instead  of  rectangles,  This  generalization  will  allow  operations  on  arbitrary  polygons  to  be 


42 


Function 

Operation 

Usage 

Clear 

Dst :  =  constant 

Clearing.  Initializing 

Constant 

Dst :  =  fixed  patttem 

Background,  Boundary 

Copy 

Dst :  =  Sre 

Text,  Scrolling 

Invert 

Dst  :=  NOT  Dst 

Highlighting 

AND 

Dst  :=  Sre  AND  Dst 

A  combination  function 

OR 

Dst :  =  Sre  OR  Dst 

Another  combination  function 

XOR 

Dst :  =  Sre  XOR  Dst 

A  non-destructive  function 

Table  4-1: 

BitBit  operations 

Function 

Operation 

Usage 

Minimum 

Dst  :=  MIN(Src.Dst) 

Grayscale  AND 

Maximum 

Dst :  =  MAX(Src.Dst) 

Gray-scalc  OR 

Complement 

Dst  :=  1  —  Dst 

Gray-scalc  Invert 

Sum 

Dst :  =  Sre  +  Dst 

Gray-scalc  XOR 

Difference 

Dst  :=  Dst  -  Sre 

Gray-scalc  XOR 

Reverse  DifF 

Dst :  =  Sre  -  Dst 

Gray-scalc  XOR 

Seale 

Dst :  =  c  *  Dst 

change  contrast 

Brightness 

Dst :  =  Dst  +  c 

change  brightness 

Tabic  4-2:  Gray-Scale  BitBit  operations 

performed  by  splitting  them  into  trape/oids  by  drawing  horizontal  lines  at  each  vertex  (Figure  4-5). 
The  trapezoid  can  be  further  generalized  by  removing  the  restriction  on  the  vertical  segments  from 
being  straight  lines  to  curves.  This  generalization  allows  us  to  perform  BitBit  on  geometries  outlined 
by  curves  [Ball  81]. 


Figure  4-5:  Polygons  can  be  split  up  into  trapezoids 


4.2.  BitBIt  implementation 


This  section  describes  the  hardware  and  software  implementations  of  BitBIt.  A  simple  loop  can  be 

used  to  iterate  over  all  pixels  in  both  the  source  and  destination  rectangular  regions  and  perform  the 

desired  operation  on  each  pair  of  pixels  in  the  two  regions.  This  loop  should  carefully  choose  the 

order  in  which  the  pixels  are  written  in  order  to  achieve  the  correct  operation  in  the  eases  when  the 

source  and  destination  rectangles  overlap.  If  an  incorrect  order  is  used  for  overlapping  regions,  then 

one  of  the  source  points  could  be  overwritten  before  it  is  moved  to  its  destination.  The  different 

possibilities  are  treated  appropriately  in  the  following  procedure  for  BitBIt.  (xsjs)  and  ( xd.yd ) 

represent  the  top  left  corners  of  the  source  and  destination  rectangles  respectively,  w  and  h  are  the 

widths  and  the  heights  of  the  rectangles,  and  op  is  the  BitBIt  operation. 

PROCEDURE  BitB1t(xs,ys,xd,yd,w,h,op) 
begin 

xdir  :=  1;  ydir  :=  1; 

if  xs  <  xd  then  begin 

xdir  :=  -1;  xs  :=  xs  +  w  -  1;  xd  :=  xd  +  w  -  1; 

end; 

if  ys  <  yd  then  begin 

ydir  :=  -1;  ys  : =  ys  +  h  -  1;  yd  :=  yd  +  h  -  1; 

end; 

yl  :=  ys;  y2  :=  yd; 

for  i  :■  0  to  h-1  do  begin 

xl  :*  xs;  x2  :=>  xd; 

for  j  :=  0  to  w-1  do  begin 

Raster(x2,y2)  :«  Raster(xl . y 1 )  op  Raster(x2,y2) 
xl  ;=  xl  +  xdir;  x2  :=  x2  +  xdir; 

end; 

yl  :=  yl  +  ydir;  y2  :*  y2  +  ydir; 

end; 

end; 


The  mirroring  and  transposition  can  be  easily  accommodated  into  this  procedure  because 
mirroring  only  changes  the  direction  of  one  of  the  two  nested  loops  and  transposition  transposes  the 
loops  for  either  the  source  or  the  destination  rectangle. 

All  BitBIt  operations  can  be  performed  on  many  pixels  in  parallel  provided  the  display  architecture 
allows  parallel  updates.  Such  parallelism  can  greatly  enhance  the  performance  and  is  even  necessary 
for  some  interactive  applications.  Chapter  2  discussed  several  parallel  architectures  which  allow  the 
update  of  several  pixels  in  each  memory  cycle.  Such  architectures  are  implemented  by  splitting  the 
frame  buffer  into  several  memory  chips;  each  memory  chip  can  be  read  or  written  during  each 
memory  cycle. 


Parallel  implementations  introduce  two  new  problems.  The  first  one  is  that  a  pixel  read  from  one 
memory  chip  might  have  to  be  combined  with  a  pixel  in  a  different  memory  segment.  This  is  due  to 
the  fact  that  the  source  and  destination  rectangles  may  not  have  the  same  alignment  with  respect  to 
the  word  boundaries  imposed  by  the  parallel  architecture.  The  second  is  the  need  for  the  ability  to 
update  a  subset  of  the  memory  chips  in  order  to  accommodate  small  area  BitBlts  and  the  end- 
conditions  of  larger  ones. 

The  next  four  sub-sections  discuss  the  parallel  implementations  for  the  scan-line  and  the  square 
memory  organizations.  For  each  of  these  organizations,  the  first  algorithm  presented  assumes  the 
ability  to  make  arbitrarily  aligned  memory  accesses;  die  second  algorithm  restricts  die  access  to  word 
boundaries. 


4.2.1.  Nxl  arbitrary  access 


This  memory  organizations  allows  us  to  access  an  arbitrary  set  of  N  pixels  along  a  scan-line  of  the 

display.  This  would  modify  the  inner  loop  of  BitBlt  to  jump  by  N  pixels  instead  of  one.  We  shall  use 

the  term  RasterWord(jrj')  to  refer  to  a  piece  of  word  accessed  with  its  left  corner  located  at  (x,y).  The 

operation  Align  aligns  die  pixels  read  at  (xlj’l)  for  writing  at  (x2,y2).  This  operation  requires  the 

movement  of  data  between  memory  chips  and  shall  be  the  topic  of  discussion  in  Section  4.3. 

PROCEDURE  Bi tBl t( xs , ys , xd ,yd , w,h , op) 

{Nxl  arbitrary  access} 
begin 

xdir  :=  N;  ydir  :=  1; 
if  xs  <  xd  then  begin 

xdir  :=  -N;  xs  :=  xs  +  w  -  N;  xd  :=  xd  +  w  -  N; 

end ; 

if  ys  <  yd  then  begin 

ydir  :=  -1;  ys  :=  ys  +  h  -  1;  yd  :=  yd  +  h  -  lj 

end ; 

yl  :=  ys;  y2  :=  yd; 

for  1  :=  0  to  h-1  do  begin 
xl  : *  xs;  x2  :*  xd; 
for  j  :=  0  to  w/N  do  begin 

RasterWord(x2 ,y2)  :=  A1 ign(RasterWord(xl , y 1 ) ) 

op  RasterWord(x2 ,y2) 
xl  :*  xl  +  xdir;  x2  :=  x2  +  xdir; 

end ; 

yl  :=  yl  +  ydir;  y2  :=  y2  +  ydir; 

end; 

end; 


The  procedure  in  the  above  exposition  updates  only  words  and  would  hence  result  in  an  error  in 


the  last  iteration  of  the  loop  if  the  width  w  is  not  an  exact  multiple  of  N.  The  last  iteration  should 

update  w%N  points;  these  points  arc  left  justified  if  xinc  is  1  and  right  justified  if  xinc  is  —  1.  An  Ar-bit 

variable  mask  is  used  to  allow  the  update  of  partial  words.  The  presence  of  a  0-bit  in  the  appropriate 

position  is  required  to  update  that  bit  of  die  display  memory  word. 

PROCEDURE  BitBlt(xs,ys,xd,yd,w,h,op) 

{Nxl  arbitrary  access  with  masking) 
begin 

xdir  :=  N;  ydir  :=  N; 
endmask  :=  (2N-1)  rshift  w%N; 
if  xs  <  xd  then  begin 

xdir  :=  -N;  xs  :=  xs  +  w  -  N;  xd  :=  xd  +  w  -  N; 
endmask  :■  ( 2N- 1 )  Ishift  w%N; 

end; 

if  ys  <  yd  then  begin 

ydir  :=  -1;  ys  :=  ys  +  h  -  1;  yd  : =  yd  +  h  -  1; 

end; 

yl  :=■  ys;  y2  :=  yd; 

for  1  :=  0  to  h-1  do  begin 
mask  : =  2N-1; 
xl  :=  xs;  x2  :=  xd; 
for  j  :a  0  to  w/N  do  begin 

if  (j  «  w/N)  then  mask  =  endmask; 

RasterWord(x2 ,y2)  :=  A1 ign(RasterWord( xl ,yl) ) 

op  RasterWord(x2,y2) 
xl  :=  xl  +  xdir;  x2  :=  x2  +  xdir; 

end; 

yl  :=■  yl  +  ydir;  y2  :=  y2  +  ydir; 

end; 

end; 


The  Align  operation  for  the  Nxl  organization  requires  a  circular  shift  of  the  word  read  (Figure  4-* 
6).  The  shift  count  is  the  difference  in  alignment  of  the  two  rectangles  with  respect  to  the  word 
boundary. 


Source  Destination 


-  circular  shifting 

i  - > 


E 


Figure  4-6:  Aligning  A'xl  rectangles 


46 


Of  die  mirroring  and  transposition  transformations,  the  x-mirror  merely  changes  the  direction  of 
the  source  y-loop,  and  the  y-mirror  changes  the  direction  of  the  x-loop  and  also  requires  die  source 
word  to  be  mirrored  in  addition  to  being  aligned  with  respect  to  the  destination  word.  The 
transposition  transformation  cannot  make  use  of  die  parallelism  offered  by  this  memory  organization 
because  this  memory  organization  does  not  allow  a  word  read  along  the  x-dircction  to  be  written 
along  the  y-dircction. 

4.2.2.  Nxl  fixed  access 

In  dais  memory  organization  die  N  pixels  accessablc  in  one  memory  access  arc  forced  to  lie 
between  yVxl  boundaries  of  the  screen  (Figure  2-1).  If  the  source  and  destination  rectangles  arc 
misaligned  with  respect  to  the  fixed  boundaries,  then  a  destination  word  that  can  be  written  in  one 
memory  cycle  cannot  be  read  from  die  source  in  one  memory  cycle.  Similarly  a  source  word  that  can 
be  read  in  one  memory  cycle  cannot  be  written  in  one  memory  cycle  into  its  destination.  Because 
some  operations  choose  to  read  both  source  and  destination  and  write  a  combined  value  into  die 
destination,  we  choose  to  read  two  adjacent  source  words  and  combine  them  for  each  destination 
word  (Figure  4-7).  Notice  that  when  we  attempt  to  write  the  adjacent  destination  word  we  have  to 
once  again  read  one  of  die  source  words  read  for  the  previous  destination  word.  This  can  be  used  to 
speed  up  the  inner  loop  by  saving  one  of  the  source  words  and  only  reading  one  source  word  for  each 
iteration  in  the  inner  loop.  This  organization  can  hence  provide  the  same  bandwidth  as  the 
organization  which  allows  arbitrary  Mel  accesses  at  essentially  no  extra  cost. 


Word  1 


Word  2 


BITBLT 


47 


4.2.3.  MxM  arbitrary  access 

Small  area  Bit  Bits  (characters  for  example)  prefer  a  square  memory  organization  because  updates 
in  such  an  organization  can  be  performed  in  fewer  memory  cycles.  The  MxM  arbitrary  access 
memory  organization  allows  the  access  and  update  of  an  arbitrary  MxM  square  on  the  display.  The 
BitBlt  procedure  is  very  similar  to  the  BitBlt  procedure  in  the  A'xl  arbitrary  access  organization.  Two 
masks  xmask  and  ymask  are  ORcd  to  produce  the  MxM  mask  which  is  used  as  the  write-enables  for 
the  memory  chips  (Figure  4-8).  The  operation  Align  aligns  the  pixels  read  (xljl)  for  writing  at 
(xl,y2). 


xmask 


0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

ymask  1 

1 

1 

1 

t 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Figure  4-8:  Masking  MxM  squares 


48 


PROCEDURE  Bit81t(xs,ys,xd1yd,w,h,op) 

{MxM  arbitrary  access  with  masking} 
begin 

xdir  :=  M;  ydir  :=  M; 
endxmask  :=  (2M-1)  rshift  (w%M); 
endymask  :=  (2M-1)  rshift  (h%M); 
if  (xs  <  xd)  then  begin 

xdir  :=  -M;  xs  :=  xs  +  w  -  M;  xd  :=  xd  +  w  -  M; 
endxmask  :=  (2M-1)  Ishift  (vi%M); 

end; 

if  (ys  <  yd)  then  begin 

ydir  :  =  -M;  ys  : =  ys  +  h  -  M;  yd  : =  yd  +  h  -  M; 
endymask  :=  (2M-1)  Ishift  (h%M); 

end ; 

yl  :*  ys;  y2  :=  yd; 
ymask  =  2M-1; 

for  i  : =  0  to  h/M  do  begin 

if  (i  =  h/M)  then  ymask  :=  endymask; 

xmask  :=  2M-1; 

xl  :  =  xs ;  x2  : =  xd ; 

for  i  :=  0  to  w/M  do  begin 

if  (j  *  w/M)  then  xmask  :=  endxmask; 

RasterWord(x2  ,y2)  :  =  AT ign(RasterWord( xl ,yl) ) 

op  RasterWord(x2 ,y2) ; 
xl  :=  xl  +  xdir;  x2  :=  x2  +  xdir; 

end ; 

yl  :=  yl  +  ydir;  y2  :=  y2  *  ydir; 

end; 

end; 


This  RitBlt  procedure  comprises  two  nested  loops  each  of  which  jumps  by  M  pixels.  The  inner 
loop  moves  along  the  x-dircction  while  the  outer  loop  moves  along  the  y-dircction.  The  fast 
movement  along  the  x-dircction  is  necessitated  by  the  fact  that  most  of  the  time  the  image  being 
scrolled  contains  text,  which  is  expected  to  change  from  top  to  bottom  and  not  from  left  to  right.  This 
loop  structure  provides  the  expected  effect.  If  BitBlt  were  fast  enough  that  the  direction  of  movement 
would  be  unnoticeablc,  we  would  then  have  to  worry  about  the  CRT  video  scanning  interacting  with 
the  BitBlt  memory  updating.  So  if  we  were  actually  scrolling  the  whole  display's  contents  and  the 
inner  loop  incremented  in  the  y-dircction.  then  the  lower  half  of  the  display  may  appear  changed 
while  the  upper  half  may  still  be  unchanged.  Choosing  the  inner  loop  to  move  in  the  x-dircction  and 
properly  synchronizing  to  the  video  scan  can  in  fact  make  the  scrolling  appear  to  be  instantaneous. 

As  in  the  case  of  the  Axl  organization,  the  x-mirror  and  y-mirror  transformations  change  die 
directions  of  die  loops  and  require  the  ability  to  mirror  Ux.U  words,  although  as  we  have  discussed 
before  the  source  and  destination  rectangles  cannot  overlap.  The  transposition  transformation  can  be 
done  by  having  the  source  rectangle  loop  in  one  direction  and  the  destination  rectangle  loop  in  the 
other  direction  and  having  the  ability  to  transpose  single  ,1/x.l/  words. 


BITBI.T 


49 


The  alignment  operation  in  the  square  organization  requires  a  two-dimensional  circular  shift 
(Figure  4-9).  The  shift  counts  in  each  dimension  arc  the  alignment  differences  between  die  source 
and  destination  rectangles  along  diose  dimensions.  The  pixels  marked  in  the  figure  by  a  small  circle, 
small  square,  and  a  large  square  arc  read  from  memory  chips  marked  by  die  same  symbols,  and 
written  into  a  different  set  of  memory  chips.  The  two-dimensional  circular  shift  aligns  these  pixels 
from  the  memory  chips  in  the  source  rectangle  to  die  ones  in  the  destination  rectangle. 


source 


Screen 


r*~ -e- 

F 

G 

L  a_ 

H 

l!  _ 

destination 


H  D 

C 

■ 

B 

• 

A 

Two-dimensional 

circular 

rotation 


H 

G 

a 

F 

’  •E 

Memory  Array 


Figure  4-9:  Aligning  MxM  squares 


4.2.4.  MxM  fixed  access 


When  die  square  organization  is  restricted  to  access  words  diat  have  to  be  aligned  to  MxM  word 
boundaries,  then  the  inner  loop  of  RitBIt  has  to  combine  four  adjacent  source  words  to  update  one 
destination  word.  The  optimization  technique  used  for  die  Nx\  fixed  access  organization  is  still 
applicable  because  when  we  attempt  to  write  adjacent  words,  dircc  out  of  four  source  required  would 
have  probably  already  been  read  to  update  previous  destination  words.  However,  saving  diese  could 
be  a  problem  because  a  destination  word  needs  the  current  source  word  to  be  combined  with  source 
words  both  from  the  previous  column  and  the  previous  row.  The  word  from  the  previous  column  was 
read  in  the  previous  memory  cycle  and  is  hence  readily  available  even  if  we  save  only  one  word  of 
memory.  The  words  in  the  previous  row  could  have  been  read  a  full  x-scan  before  and  saving  them 


50 


could  potentially  involve  storing  M  complete  scan-lines.  If  we  discard  this  possibility,  but  save  the 
word  read  in  the  previous  memory  access  then  we  lose  1/4  the  memory  bandwidth  on  the  average. 
We  can  still  use  the  entire  memory  bandwidth  when  die  source  and  destination  rectangles  are  exactly 
word  aligned  in  the  y-direction  (during  a  horizontal  scroll  for  example).  But  if  the  two  rectangles  are 
misaligned  by  exactly  M/2,  then  only  1/2  the  bandwidth  can  be  used. 

4.3.  BitBIt  communication 

The  transformation  operations  required  by  all  BitBIt  algorithms  require  the  movement  of  data 
between  memory  chips  in  order  to  align  reads  and  writes.  This  section  shall  discuss  the  requirements 
for  the  scan-line  and  square  memory  organizations. 

4.3.1.  Nxl  communication 

The  Mel  memory  organization  requires  N  memory  chips  which  can  be  used  in  parallel  to  read  or 
write  N  pixels  which  lie  along  a  scan-line  of  the  display.  Depending  upon  the  addressing  scheme  used 
these  N  pixels  can  be  constrained  to  lie  within  a  fixed  Nxl  grid  on  the  display  or  be  free  to  originate 
at  any  arbitrary  pixel.  In  both  these  variations  of  the  Nxl  memory  organiztion,  a  circular  shift  is 
required  to  align  an  Nxl  word  so  that  it  can  be  updated  at  a  different  location.  The  shift  count  is  tire 
alignment  difference  between  the  two  words.  The  y-mirror  operation  also  requires  tire  ability  to 
mirror  tire  words. 

There  arc  several  alternatives  for  implementing  these  transformations  which  vary  in  speed  and 
space  required  in  terms  of  logic  and  pins.  There  arc  two  classes  of  implementation  alternatives.  The 
first  class  of  alternatives  performs  the  trail  formations  externally  after  reading  the  word  to  be  rotated 
from  the  memory  and  then  writing  the  rotated  word  back.  'The  second  manipulates  the  data  within 
the  memory  chips  by  connecting  them  suitably  to  each  other. 

The  obvious  external  implementation  takes  the  N  pixels  from  the  N  memory  chips,  performs  the 
shifting/mirroring  externally  and  returns  the  N  transformed  pixels  to  the  memory  (Figure  4-10).  The 
external  rotation  can  be  performed  using  a  barrel  shifter  or  one  of  the  other  well  known  shifting 
schemes.  The  mirroring  merely  adds  another  level  of  multiplexing. 

The  most  obvious  internal  implementation  connects  the  memory  chips  to  both  their  adjacent 
neighbors  and  performs  the  shifting  by  sequentially  shifting  the  data  from  chip  to  chip  (Figure  4-11). 


External  Data 

Figure  4-10:  External  transformation  in  the  A'xl  organization 

This  simple  scheme  requires  2g  pins  in  each  memory  chip  (g  is  the  number  of  bits/pixel)  and  takes  a 
maximum  of  N/2  steps  to  perform.  The  average  number  of  steps  required  is  A'/4.  Adding  more  pins 
can  improve  this  performance  (Section  4.3.3). 


Figure  4-1 1:  Neighbour  connections  in  the  Mel  organization 


4.3.2.  MxM  communication 

The  MxM  memory  organization  is  implemented  using  M 2  memory  chips  and  provides  the  ability 
to  read  MxM  squares  of  the  display.  '  The  memory  can  be  addressed  to  provide  only  word-aligned 
squares  or  arbitrary  squares  with  the  additional  variation  of  staggering  the  squares  to  also  provide 
horizontal  spans.  The  square  organization  requires  a  two-dimensional  circular  shift  to  align  an  MxM 
word  with  a  different  word  (the  situation  for  the  staggered  addressing  is  slightly  different  in  which 
case  a  y-dircctional  shift  of  selected  columns  is  also  necessary).  Hie  rotation  counts  in  each  dimension 
depend  upon  die  alignment  differences  in  those  dimensions.  In  addition  we  also  need  the  x-inirror,  y- 
mirror,  and  transposition  Iran  format  ions  to  implement  the  mirroring  and  transposition. 

There  exist  several  possibilities  for  implementing  the  transformations  external  to  the  memory  chip 
array.  The  shifting  can  be  performed  sequentially  in  .U-bit  segments  which  would  take  M  time  steps. 
A  parallel  implementation  using  2.1/  shifters.  .\l  for  one  dimension  and  M  for  the  other,  can 


52 


implement  the  shifting  in  one  time  step.  A  slightly  different  implementation  takes  two  time  steps  by 
time-multiplexing  between  one  set  of  M  shifters,  but  the  cost  of  the  multiplexing  makes  this  scheme 
as  expensive  as  the  previous  one.  Any  one  of  these  shifting  schemes  has  to  be  followed  by 
multiplexing  to  implement  the  mirroring  and  transposition  transformations. 

An  attractive  connection  mechanism  for  performing  this  transformation  uses  only  two  sets  of  M 
wires  to  connect  die  A/2  memory  chips  (Figure  4-12).  The  dark  set  of  wires  can  be  used  to  read  out 
any  row  or  column,  and  the  dotted  wires  can  be  used  to  write  into  any  row  or  column.  The  external 
network  can  shift/mirror  die  row  or  column  read.  This  technique  requires  only  2 g  wires  in  each 
memory  chip  but  takes  M  time  steps.  Only  four  control  wires  per  memory  chip  arc  required  to 
control  this  mechanism.  Two  of  diese  wires  arc  bussed  along  rows  and  the  other  two  arc  bussed  along 
columns.  One  of  each  selects  whether  the  chip  should  enable  its  data  onto  die  dark  wire  and  the 
second  selects  whether  die  chip  should  latch  the  contents  of  the  dotted  wire. 


r-  -i-  -  r  -  “I 


A  simpler  connection  mechanism  connects  each  memory  to  its  four  immediate  neighbours  (Figure 
4-13).  The  mirroring/transposition  requires  appropriate  multiplexing  at  die  edges.  This  technique 
requires  4g  data  wires  per  memory  chip  and  only  two  control  wires  which  encode  the  direction  of 
transfer.  This  mechanism  takes  a  maximum  of  M  and  an  average  of  M/2  steps  to  perform  the  two 
dimensional  shifting  required  to  align  die  data.  Mirroring  and  transposition  always  require  M  dmc 
steps. 

The  neighbor  connection  mechanism  can  be  made  more  efficient  by  using  tri-state  wires  and  using 
a  scheme  shown  in  Figure  4-14.  This  improvement  allows  each  chip  to  communicate  with  any  of  its 
eight  neighbors.  The  two  dimensional  shifting  now  only  requires  a  maximum  of  M/2  and  an  average 


Figure  4*13:  Ncigbour  connections  in  the  A/x.U  organization 

of  M/4  steps.  Mirroring  and  transposition  still  require  M  steps.  The  control  now  needs  t 
encode  the  eight  possible  directions  of  data  transfer. 


D  Ext  Mirror(D) 


Figure  4-14:  Tri-state  neighbour  connections  in  the  Mx.\f  organization 


54 


All  the  above  connections  need  another  wire  for  enabling  or  disabling  the  transfer  when  the 
staggered  addressing  is  being  used  in  order  to  remove  the  stagger  after  accessing  an  Mx.\!  square. 

4.3.3.  Non-neighbor  communication 

Non-neighbor  communication  can  be  used  to  achieve  higher  performance  for  the  circular  shifting 
operation.  Section  4.3.1  showed  that  in  the  nearest  neighbor  communication  paths  of  the  Nx  1 
memory  organization,  a  shift  takes  a  maximum  of  N/2  and  an  average  of  A/4  steps.  If  we  increase  the 
number  of  connections  to  four  neighbors,  then  the  question  is  what  neighbors  to  choose.  Symmetry 
arguments  dictate  that  the  connections  on  one  side  will  be  the  same  as  the  ones  on  the  other.  One 
choice  of  connections  is  to  choose  the  immediate  neighbors  and  tine  ones  VN  away.  Using  this 
interconnection  scheme,  an  arbitrary  location  can  be  reached  in  a  maximum  of  VN  moves  and  an 
average  of  V/V  /2  moves. 

We  can,  however,  always  improve  on  the  \ZU /2  bound  for  the  average  number  of  steps  by  a 
constant  factor.  Assume  that  die  neighbors  arc  at  a  distance  of  a  and  b  in  both  directions.  In  one  shift 
step,  the  data  can  be  moved  from  point  0  to  -a,  -b,  a,  and  b.  In  two  steps, the  data  can  reach  -2a, 
-a-b,  - a+b ,  -2b,  -b+a,  2a,  a+b,  and  2b.  In  general  the  /til  step  reaches  4/  points.  If  all  these 
points  were  indeed  disjoint, then  k  steps  would  span  a  total  of 

2  k2  +  2k+l 

points.  If  this  total  was  equal  to  A',  Uicn  k  is  asymptoticaliy  equal  to  V N/2  ,  and  die  average  number 
of  steps  to  reach  any  point  is  V2N  /}.  This  bound  is  only  slightly  better  dian  V~N  /2,  but  can  almost 
always  be  achieved.  Table  4-3  shows  the  values  of  a  and  b  which  result  in  die  minimum  number  of 
moves  for  different  values  of  N.  It  also  tabulates  the  average  number  of  moves  Av,  and  die  ratio 
Av/VN ',  which  asymptotically  approaches  V2  /3. 

These  results  can  be  extended  to  more  neighbors  and  also  to  two-dimensional  neighbors.  In  the 
extreme  ease  each  memory  chip  can  be  connected  all  other  memory  chips,  in  which  ease  all  shifts  take 
only  one  step. 


BITBLT 


55 


N 

a 

b 

Av 

Av/VN 

8 

1 

2 

1.250000 

0.441942 

16 

1 

6 

1.812500 

0.453125 

24 

3 

4 

2.208333 

0.450774 

32 

1 

7 

2.625000 

0.464039 

40 

4 

5 

2.900000 

0.458530 

48 

1 

20 

3.229167 

0.466090 

56 

1 

10 

3.482143 

0.465321 

64 

,  1 

14 

3.718750 

0.464844 

72 

1 

11 

3.972222 

0.468131 

80 

1 

22 

4.175000 

0.466779 

88 

1 

26 

4.375000 

0.466377 

96 

1 

42 

4.593750  . 

0.468848 

104 

1 

16 

4.778846 

0.468604 

112 

7 

8 

4.937500 

0.466550 

120 

1 

14 

5.133333 

0.468607 

128 

1 

15 

5.312500 

0.469563 

Table  4-3:  Average  moves  for  four  neighbor  connections. 


4.3.4.  Overlapping  communication  with  memory  access 

The  inner  loop  of  all  the  algorithms  for  BitBlt  requires  an  Align  operation  between  a  pair  of  read 
and  write  memory  cycles.  In  the  presentation  of  the  algorithms,  the  read,  align,  and  write  operations 
are  performed  sequentially.  A  slow  alignment  mechanism  hence  becomes  a  bottleneck  of  BitBlt 
performance.  For  BitBlt  areas  much  larger  than  the  area  accessed  in  a  memory  cycle,  the  three 
operations  arc  performed  repeatedly.  An  optimization  of  die  inner  loop  is  to  overlap  a  read  and  write 
operation  with  an  align  operation.  A  sequence  of 

Read,  Align,  Write,  Read,  Align,  Write,  Read,  Align,  Write,  Read,  Align,  Write 
is  now  transformed  into 

Read,  (Align/Rcad),  (Writc/Align),  Read,  (Writc/Align),  Read,  (Write/Align),  Write 

If  an  align  operation  takes  longer  than  a  write  followed  by  a  read  memory  cycle,  then  the  memory  is 
idle  for  some  of  the  time.  Conversely,  the  align  operation  docs  not  have  to  be  shorter  than  a  write  and 
read  memory  cycle. 

In  die  ease  of  die  MxM  organization  for  M=  8.  die  tri-state  neighbor  connection  mechanism  takes 
a  maximum  of  4  steps  for  alignment.  This  is  approximately  the  same  as  die  time  required  for  two 
memory  cycles  which  take  2  steps  each  because  of  the  multiplexed  addressing. 


56 


4.4.  BitBIt  performance 

This  section  discusses  the  performance  of  BitBIt  for  die  various  memory  organizations  described  in 
this  chapter.  The  memory  organizations  can  be  compared  only  for  the  same  cost  measure  which  can 
be  considered  to  be  the  number  of  chips  in  the  memory  system.  The  Mel  organization  uses  N 
memory  chips  and  the  A/x.V  organization  uses  A/2  memory  chips.  We  shall  base  all  comparison  for 
two  different  cost  measures,  one  for  16  memory  chips  and  for  64  memory  chips.  N  has  a  value  of  16 
and  64  in  die  two  cost  measures  for  die  Mel  organization  while  M  has  a  value  of  4  and  8  for  the  MxM 
organization. 

We  shall  use  two  criteria  for  comparison,  die  first  criterion  based  on  die  performance  of  small  area 
BitBlts  (e.g.  8x10  characters),  and  die  second  based  on  large  area  ones  (c.g.,  scrolling  a  full  1024x1024 
display).  If  die  alignment  operation  is  overlapped  with  the  memory  cycles,  then  die  performance  is 
directly  related  to  die  percentage  of  the  memory  bandwidth  dial  can  be  used  for  updates. 


4.4.1 .  Small  area  BitBIt 


An  8x10  character  will  be  used  to  illustrate  the  performance  of  small  area  BitBlts  in  the  various 
memory  organizations.  We  shall  first  discuss  the  memory  systems  using  16  memory  chips  and  then 
discuss  the  larger  memory  system  with  64  memory  chips. 


In  a  16  chip  memory  system: 

•  with  a  16x1  fixed  access  memory  organization,  updating  an  8x10  character  will  take  10  or 
20  memory  cycles  depending  on  whether  the  character  crosses  the  word  boundary  or  not. 
The  probability  of  the  character  crossing  the  boundary  is  about  7/16,  and  hence  the 
average  number  of  memory  cycles  required  is  14.375. 

•  with  a  16x1  arbitrary’  access  memory  organization,  updating  an  8x10  character  will  always 
take  10  memory  cycles. 

•  with  a  the  4x4  fixed  access  memory  organization,  updating  an  8x10  character  will  take  a 
minimum  of  6  memory  cycles  when  the  word  boundaries  match  with  the  character  and  a 
maximum  of  12  memory  cycles  in  die  padictic  ease  of  all  boundaries  mismatching. 
Assuming  that  all  pixel  locations  arc  equally  likely,  die  average  number  of  memory  cycles 
is  8.9. 

•  with  a  die  4x4  arbitrary -  access  memory  organization,  updating  an  8x10  character  always 
lakes  6  memory  cycles. 


In  a  64  chip  memory  system 


BITHLT 


57 


•  with  a  64x1  fixed  access  memory  organization,  updating  an  8x10  character  will  probably 
take  10  memory  cycles  but  it  might  take  20  in  the  unlikely  case  of  the  character  crossing 
the  word  boundary.  Taking  probabilities  into  account  the  character  can  be  written  in  an 
average  of  1 1 .1  memory  cycles. 

•  with  a  64x1  arbitrary’  access  memory  organization,  updating  an  8x10  character  will  always 
take  10  memory  cycle. 

•  with  a  8x8  fixed  access  memory  organization,  updating  an  8x10  character  takes  a 
minimum  of  2  memory  cycles,  a  maximum  of  6  memory  cycles,  and  an  average  of  4.1 
memory  cycles. 

•  with  a  8x8  arbitrary  access  memory  organization,  updating  an  8x10  character  will  always 
take  2  memory  cycles. 


These  results  arc  summarized  in  Table  4-4. 


Memory  chips 

16 

64 

Nxl  fixed  access 

14.4 

11.1 

A'xl  arbitrary  access 

10 

10 

MxM  fixed  access 

8.9 

4.1 

MxM  arbitrary  access 

6 

2 

Table  4-4:  Average  number  of  memory  cycles  to  update  an  8x10  character, 
for  different  memory  organizations  with  16  and  64  memory  chips. 


4.4.2.  Large  area  BitBIt 

Scrolling  a  large  display  region  will  be  analyzed  to  compare  die  performance  of  various  memory 
organizations  for  large-area  BitBlts.  I.argc-arca  BitBlts  repeat  the  inner  loop  a  large  number  of  times 
and  the  end-conditions  of  the  area  do  hot  significantly  affect  the  performance.  The  performance  can 
be  essentially  compared  based  on  the  utilization  of  the  memory  accesses  in  the  inner  loop. 

The  inner  loops  of  BitBIt  update 

•  N  pixels  in  the  Nxl  fixed  access  memory  organization.  The  BitBIt  procedure  has  to  save  N 
pixels  from  consecutive  memory  accesses  in  order  to  maintain  the  100%  utilization. 

•  N  pixels  in  the  Nxl  arbitrary  access  memory  organization.  No  buffering  is  necessary.  The 
memory  organization  hence  utilizes  100%  of  the  memory  bandwidth. 

•  an  average  of  3/4  out  of  the  MxM  pixels  in  the  MxM  fixed  access  memory  organization 
when  we  buffer  only  one  MxM  word.  The  memory  organization  only  utilizes  75%  of  the 
memory  bandwidth. 


58 

•  all  of  the  MxM  pixels  in  the  MxAl  arbitrary  access  memory  organization  and  no  buffering 
is  required.  This  memory  organization  also  utilizes  100%  of  die  memory  bandwidth. 

4.4.3.  Analysis 

From  the  discussion  of  large  area  MBits,  it  is  obvious  that  if  we  choose  the  .MxM  organization 
then  we  should  organize  the  addressing  to  access  arbitrarily  aligned  squares,  and  if  we  w  ere  to  choose 
the  Mel  organization  we  could  use  either  the  fixed  or  the  arbitrary  access  addressing.  From  the 
discussion  of  small  area  BitBlts,  we  can  conclude  that  for  die  16  chip  system  all  memory  organizations 
are  approximately  equally  good  within  a  factor  of  2,  and  for  the  64  chip  system  the  square 
organization  is  preferable  to  die  linear  organization  by  almost  a  factor  of  5  for  characters. 

The  overall  conclusion  is  diat  if  we  were  designing  a  cheap  low  bandwidth  memory  system,  we 
would  choose  die  linear  fixed  access  memory  organization.  If  we  were  designing  a  high  bandwidth 
system  we  would  use  die  square  arbitrary  access  memory  organization.  These  design  decisions  arc 
indeed  true  for  the  SUN  work-station  [Bcchtolshcim  80]  which  uses  die  16x1  fixed  access  memory 
organization,  and  the  8x8  display  [Sutherland  81]  which  uses  the  square  arbitary  access  memory 
organization. 


LINE  DRAWING 


59 


Chapter  5 
Line  Drawing 

Line  drawing  is  perhaps  the  most  frequent  of  the  graphical  operations  used  to  create  illustrations 
on  computer  displays.  It  is  the  only  operation  provided  by  calligraphic  displays;  all  images  displayed 
on  such  displays  have  to  be  created  as  line  drawings.  On  raster  displays,  lines  arc  approximated  by 
intensifying  an  appropriate  set  of  pixels.  The  desire  to  achieve  extremely  high  line-drawing  speeds 
motivated  this  study  of  a  variety  of  different  algorithms  that  can  be  used  to  draw  lines.  These 
algorithms  can  also  be  used  for  other  scan-conversion  tasks  such  as  the  filling  of  polygons  and  the 
drawing  of  parametric  curves. 

Displaying  a  set  of  pixels  by  merely  turning  them  on  is  the  most  common  form  of  displaying  lines. 
Such  lines  are  called  bit-map  lines,  and  arc  drawn  by  displaying  the  pixels  that  lie  close  to  the  desired 
line.  Such  lines  show  "jaggies"  when  the  lines  jump  between  rows  or  columns  of  pixels.  This 
jagginess  is  an  artifact  of  the  sampling  error  that  occurs  when  a  signal  is  not  sampled  at  the 
appropriate  resolution,  and  is  known  as  aliasing  [Crow  76],  Bit-map  lines  can  be  smoothed  by  using 
grayscale  pixels  and  intensifying  these  pixels  at  different  intensities  to  produce  aesthetically 
improved  lines.  The  use  of  intermediately  intensified  pixels  can  hence  be  used  to  effectively  smooth 
the  row  or  column  jumps  observed  in  bit-map  lines.  Such  lines  are  called  grayscale  or  anti-aliased 
lines.  This  chapter  discusses  bit-map  lines,  while  anti-aliasing  is  the  topic  of  the  next  two  chapters. 

To  achieve  very  high  line-drawing  speeds,  we  need  the  ability  to  update  several  pixels  in  parallel. 
The  various  memory  organizations  wc  discussed  earlier  provide  this  ability.  We  now  need  algorithms 
which  can  allow  us  to  use  this  ability.  Conventional  line-drawing  algorithms  draw  out  the  line  by 
sequentially  determining  the  location  of  the  next  pixel  to  be  intensified.  This  chapter  will  discuss 
various  algorithms  that  generate  more  than  one  pixel  at  a  time,  and  their  applicability  under  the 
various  memory  organizations. 

Wc  shall  restrict  all  line-drawing  discussions  to  lines  drawn  from  the  coordinates  (0,0)  to  (dx,dy). 
This  assumption  causes  no  loss  of  generality  because  lines  with  different  origins  arc  translations  of 


lines  with  ti...  origin  (0,0).  We  shall  also  assume  that  the  lines  drawn  lie  in  the  first  octant,  i.c.,  0  <  dy 
<  dx.  This  assumption  also  loses  no  generality,  because  lines  in  the  other  octants  can  be  generated 
using  a  combination  of  reflection  and  transposition  transformations. 

5.1.  Bit-map  lines 

A  line  in  die  first  octant  illuminates  one  pixel  in  each  vertical  column.  This  assumption  ensures  a 
line  of  nearly  constant  width  and  brightness.  So  the  line  drawing  problem  is  to  compute  die  value  of 
y.  for  each  x  between  0  and  dx.  The  optimal  point  to  be  illuminated  is  the  one  which  lies  closest  to 
the  line.  The  closest  point  is  the  one  diat  lias  the  smallest  perpendicular  distance  to  die  line.  However, 
for  any  given  line  the  vertical  distance  is  related  to  the  perpendicular  distance  by  a  constant  factor  (  = 
dx/sqvlidx1  +  dy^)).  Consequently,  minimizing  the  vertical  distance  to  the  line  is  sufficient  to  produce 
the  optimal  line.  The  vertical  distance  is  minimized  by  rounding  the  actual  y-valuc  of  the  line  at  x.  to 
produce  the  value  of  y.  This  leads  to  the  following  obvious  algorithm  which  enumerates  the  optimal 
line. 

for  xi  : =  0  to  dx  do  begin 

y.  :=  round(  (dy/dx)  *  x.); 

disp1ay(x.,y1); 

end 

The  maximum  vertical  error  for  this  algorithm  is  hence  1/2. 


V'*  •  ® 


Simple  program  transformations  can  be  used  to  transform  this  algorithm  to  the  well  known  line 
drawing  algorithms  [Spriuill  81).  The  first  transformation  of  this  series  changes  die  multiplication 
into  an  incremental  addition  such  dint  the  value  of  v.+  j  is  computed  by  incrementing  the  value  of  yf 
For  lines  in  the  first  octant  die  point  in  the  next  column  is  cidicr  horizontally  or  diagonally  across 
from  the  current  column  (see  Figure  5-1).  The  decision  of  whether  it  is  horizontally  or  diagonally 
across  depends  upon  the  value  of  ef  which  is  die  vertical  distance  between  the  value  of  the  line  at 


LINE  DRAWING 


61 


jf  .+  1  and  yf  If  this  value  is  greater  than  or  equal  to  1/2  then  the  next  point  is  diagonally  across,  else  it 
is  horizontally  across.  The  value  of  e.  is  updated  appropriately  depending  upon  the  direction  of  the 
move.  The  following  algorithm  reflects  this  change  to  an  incremental  algorithm. 
yi  :=  0; 

e.  :=  (dy/dx);  {  initial  error  at  x.  } 

for  x^  :=  0  to  dx  do  begin 
display(x.,y.); 
if  (ei  >  1/2)  then  begin 

Yi  ;s  y,  + 

e.  :=  e.  +  (dy/dx)  -  1; 
end 

else  begin 

e.  :=  e.  +  (dy/dx); 
end 
end 


Substituting  r  =  2(e.  -  1/2 )dx  converts  this  algoritlim  into  die  well  known  Brcscnham  algorithm 
[Brcscnham  65]. 

Yi  :=  0; 
r  :=  2*dy  -  dx; 
for  xi  :=  0  to  dx  do  begin 
di spl ay ( x  .  ,yf ) ; 
if  (r  >  0)  then  begin 
Yi~:=  Yi  + 

r  : =  r  +  (2*dy  -  2*dx) ; 
end 

else  begin 

r  :=  r  +  (2*dy); 
end  . 
end 

The  Brcscnham  algorithm  is  an  integer  algorithm  which  involves  only  one  addition  and  comparison 
in  die  inner  loop  (the  expressions  in  parentheses  can  be  precomputed).  It  generates  one  point  of  the 
optimal  line  in  each  iteration  of  die  inner  loop.  The  variable  r  is  a  scaled  measure  of  die  vertical 
error,  die  distance  from  the  line  to  the  pixel  illuminated. 


5.2.  Measure  of  appearance 

Bit-map  lines  show  annoying  jaggedness  when  they  jump  across  rows  or  columns.  I.ines  which  arc 
close  to  being  horizontal  or  vertical  are  much  more  annoying  then  the  lines  diat  arc  eidicr  exactly 
horizontal  or  vertical  or  close  to  being  diagonal.  This  section  identifies  some  measures  dial  can  be 
used  to  characterize  the  appearance  of  bit-map  lines. 

Before  attempting  to  characterize  the  appearance  of  die  lines  produced  by  any  given  algorithm,  a 


62 


minimum  set  of  acceptance  criteria  have  to  satisfied  by  all  the  lines  produced  by  the  algorithm.  Hie 
most  important  of  these  requirements  is  the  matching  of  the  two  end-points,  which  means  that  the 
two  points  between  which  the  line  is  supposed  to  be  drawn  should  definitely  be  displayed.  Two  other 
requirements  are  that  the  line  should  be  monotone  and  should  not  have  any  gaps  or  jumps.  For  the 
line  from  (0,0)  to  ( d.x.dy )  in  the  first  octant  (0  <  Jy  <  dx),  these  requirements  impose  the  following 
constraints 

1- T0  =  0 

2- yjx^dy 

3.  Forall  i  suchthat  1  <  /  <  dx 

o<0(.  -th)<i 

The  Bresenham  algorithm  meets  all  these  requirements. 

The  algorithms  typically  used  to  draw  lines  arc  designed  to  minimize  die  error  from  the  ideal  line. 
The  Bresenham  algorithm,  for  example,  ensures  that  the  maximum  deviation  from  a  line  to  any  pixel 
displayed  is  less  than  or  equal  to  1/2.  The  maximum  deviation  from  the  ideal  line  hence  provides  a 
measure  of  the  appearance  of  the  lines  produced.  This  measure  is  fairly  good  to  the  first  order 
because  the  same  line  drawing  algorithms  produce  better-looking  lines  on  displays  with  higher 
resolutions,  where  the  actual  screen  distance  of  die  maximum  deviation  is  the  only  parameter  getting 
reduced.  The  average  deviation  from  the  ideal  line  can  be  used  as  a  measure  of  the  distribution  of  die 
error.  Other  statistical  norms  can  also  be  used  to  measure  this  distribution. 

Much  of  the  unpleasant  appearance  of  lines  is  due  to  the  wavy  appearance  of  die  jaggics,  which  is 
not  captured  by  any  of  die  measures  we  have  discussed  so  far.  One  measure  diat  captures  the  wavy 
nature  of  signals  is  the  Fourier  Transform.  T  lie  Fourier  Transform  of  a  wavy  signal  shows  a  peak  at 
die  fundamental  frequency  of  the  wave.  Figure  5-2  shows  three  different  lines,  die  distribution  of  the 
absolute  vertical  error  over  die  lines,  and  the  Fourier  Transform  of  die  error  distributions.  Notice  diat 
the  shallow  line  shows  a  low  frequency  wave,  and  die  steep  line  shows  a  high  frequency  wavy 
behaviour.  The  zero  frequency  (DC)  component  of  the  Fourier  Transform  is  the  average  error  from 
the  ideal  line. 

The  sum  of  the  Fourier  Transforms  for  all  lines  with  different  slopes  (i.c.,  for  a  given  </.v,  lines  such 
diat  0  <  dy  <  dx)  adds  up  to  a  constant  because  of  the  shift  towards  higher  frequencies  as  the  line  gets 
steeper.  Hie  sum  is  plotted  in  Figure  5-3.  'Flic  zero  frequency  component  of  the  sum  (not  shown  in 
die  graph  but  indicated  in  the  figure),  indicates  the  average  error  for  lines  with  all  slopes.  This  sum  of 
die  Fourier  Transforms  is*a  measure  that  will  be  used  to  compare  different  line  drawing  algorithms, 
because  it  can  used  to  find  out  if  the  algorithm  is  adding  an  undesirable  ripple  onto  the  line. 


Figure  5*2:  Three  different  lines,  their  error  distributions, 
and  the  Fourier  Transforms  of  die  error  distributions. 

5.3.  Bit-map  lines  using  precomputed  strokes 

One  obvious  way  to  speed  up  the  line  drawing  process  is  to  compose  longer  lines  using  shorter 

precomputed  segments.  The  line  drawing  algorithms  now  have  to  determine  which  of  the  set  of 

segments  to  use  and  where  to  place  them.  This  technique  is  particularly  useful  for  displays  where 

lines  have  to  be  approximated  using  "characters"  (Jordan  74],  or  in  display  architectures  which  allow 

» 

the  updating  of  several  pixels  at  a  time,  the  subject  of  this  thesis. 


64 


D.C.  value  =  63.00 


Figure  5-3:  Sum  of  Fourier  transforms  for  all  lines  with 
dy  =  128;  0  <  dx  <  dy 

Ficure  5-4  shows  an  example  line  which  has  been  drawn  using  two  different  strokes,  each  of  which 
is  eight  pixels  long.  In  general  we  can  attempt  to  draw  lines  using  ,Y-p;  el  strokes.  For  shallow  lines 
with  slopes  less  than  one,  the  N  pixels  of  each  stroke  will  lie  in  A'  different  columns;  conversely,  for 
die  steeper  lines  having  slopes  greater  titan  one,  the  N  pixels  of  each  stroke  w  ill  be  in  N  different 
rows.  In  order  to  be  able  to  update  me  display  memory  with  die  contents  of  the  stroke,  the  memory 
has  to  have  die  ability  to  update  cither  N  rows  or  N  columns  at  a  time.  This  ability  is  provided  by  the 
NxN  square  memory  organization. 


Wc  shall  assume  the  ability  to  place  die  stroke  at  an  arbitrary  pixel  boundary  of  die  display,  lliis 
implies  dial  all  strokes  can  defined  such  that  die  first  pixel  displayed  is  located  at  the  origin  (0.0)  and 
the  translation  of  every  point  of  die  stroke  can  be  used  to  position  the  stroke  at  any  pixel  boundary. 


1.1  NE  DRAWING 


65 


Wc  shall  once  again  restrict  the  discussion  to  lines  in  tlic  first  octant.  Lines  in  the  other  octants  can  be 
drawn  either  by  using  a  combination  of  mirroring  and  transposing  die  strokes,  or  by  using  a  larger  set 
of  strokes. 

Strokes  in  die  first  octant  have  die  same  properties  as  the  lines  in  die  first  octant.  Each  stroke 
extends  over  A'  columns,  with  each  of  the  columns  containing  exactly  one  pixel.  Each  stroke  can 
hence  be  defined  as  an  array  of  integers  5/raA.T[0..iV-l],  such  that  the  /th  element  of  array  defines  the  y 
position  of  the  pixel  in  the  /th  column  of  die  stroke.  This  stroke  can  dien  be  positioned  at  any  origin 
(nx,oy)  using  the  following  procedure. 

for  x  : =  0  to  (N-l)  do  begin 
display(ox+x,oy+stroke[x]) 

Lines  of  arbitrary  length  can  be  drawn  by  truncating  die  final  stroke  to  die  desired  length. 

A  line  drawing  algoridun  which  draws  lines  using  strokes  does  so  by  repeatedly  choosing  one  from 
a  set  of  strokes  and  placing  it  at  some  position  along  the  line.  The  algorithm  also  determines  the  total 
number  of  strokes  that  have  to  be  precomputed  and  stored.  This  number  can  be  affected  by  die 
primitives  provided  by  die  display;  we  have  seen  die  effect  of  the  presence  or  absence  of  mirroring 
and  transposition.  The  algorithm  also  determines  die  maximum  number  of  strokes  used  to  draw  any 
line. 

Figure  5-5  shows  two  lines  drawn  using  copies  of  one  stroke  only.  The  first  8  pixels  of  die  line  arc 
used  repetitively  to  generate  the  line.  These  lines  show  jumps  and  non-monotonicities.  These  occur 
because  of  die  fact  that  die  slope  of  the  line  might  be  either  less  than  or  greater  than  die  slope  of  die 
stroke  and  repetitive  use  of  this  stroke  will  result  in  cidier  undershooting  or  overshooting  the  line. 
This  error  w  ill  then  have  to  be  corrected  by  either  jumping  up  or  going  back  down  again.  The  lines 
produced  in  this  manner  arc  hence  unacceptable. 

One  alternative  is  to  use  two  strokes,  one  of  which  is  slightly  shallower  than  the  line  desired  and 
one  which  is  slightly  steeper.  The  two  obvious  ones  arc  the  ones  which  rise  to  the  points  just  below 
and  just  above  the  y-intcrccpt  of  the  line  in  die  A'di  column  (see  Figure  5-6).  Repeated  use  of  diesc 
two  strokes  avoids  the  gap  and  monotonicity  problems.  Figure  5-7  shows  a  couple  of  lines  drawn 
using  two  strokes  for  each  line.  The  algorithm  now  has  to  determine  which  of  the  two  strokes  to  place 
and  where.  There  arc  two  different  ways  in  which  this  can  be  done.  They  differ  in  the  amount  of 
computation  performed  and  also  result  in  slightly  different  lines.  The  first  method  uses  an  algorithm 
dial  generates  points  along  the  line  that  arc  /V  pixels  apart  and  dien  joins  them  using  the 
precomputed  strokes,  lienee  this  method  would  generate  xQ,  xN.  x^Y,  ,x^v,  -  -  -.  The  first  line  in 


66 


l 


O  0 


9  O 


D  □ 


o  a 


a 


o  o 


o  9 


•  • 

•  O  O  B  O 

9  9  0  0 

0  O 


■  a  6o«e 
B  Q  H  a  9  0 


Figure  5-5:  Line  drawing  using  only  one  stroke. 

The  first  line  shows  jumps,  and  die  second  shows  non-monotonicitics. 


Figure  5-7  marks  all  such  pixels.  The  second  method  also  uses  A-pixcl  strokes  but  generates  both  the 
endpoints  of  each  stroke  which  arc  (/V- 1)  pixels  apart.  It  will  hence  generate  xff  xN  r  x^  x2i\-r  X2N' 
x3*J-t  *  ’  ’  (mar^c<^  *n  ^1C  second  line  in  Figure  5-7).  flic  algorithm  used  in  the  second  method  is 
computationally  more  expensive  because  it  has  to  generate  more  points,  but  it  docs  result  in  slightly 
better  lines. 


5.3.1 .  The  N-step  algorithm 

We  can  modify  Brescnham's  algorithm  to  generate  every  Mb  point  of  a  line.  The  line  can  then  be 
drawn  by  joining  these  points  using  the  precomputed  strokes.  Although  there  arc  A+l  points  in  the 
stroke  that  joins  two  points  that  arc  N  apart,  we  can  get  by  using  an  A’-point  stroke  if  we  assume  that 
the  last  point  will  be  updated  as  the  first  point  of  the  next  stroke.  Figure  5-8  shows  the  ,V+1  strokes 
that  arc  needed  to  join  such  a  tuple  of  points.  The  last  point  docs  not  have  to  be  stored  as  part  of  the 
stroke  if  we  assume  the  discussed  convention. 


;4 


•4 


H 

*-1 


1 


i 


•  d  1,: 


X  X 


UNI- DRAWING 


• 

9 

9  9 

O 

X 

D 

■ 

H  B 


•  • 


•XX* 

•  •  •  • 


9  9  9  9 

*  ■  BXX° 


Figure  5-7:  I.incs  drawing  using  two  different  strokes. 


The  following  obvious  algorithm  generates  every  A'lh  point  of  any  given  line. 


68 


•  X  4/3 

•  • 

•  • 

.  x  3/8 

•  •  • 

•  • 

.  .  x  2/8 

•  •  •  • 

•  •  •  •  1/8 

•  • 

. X  0/8 


X  3/8 
X  7/8 
X  6/8 
X  6/8 


Figure  5-8:  Strokes  used  by  the  /V-step  algorithm  for  N=%. 


for  x.  :=  0  to  dx  by  N  do  begin 
:=  round (  (dy/dx)  *  x^ 
display(x.  .y^ 

end 


Similar  to  die  single  step  algorithm,  this  algorithm  can  be  made  to  compute  y.  incrementally  by 
observing  that  the  value  of  y.  will  either  be  incremented  by  s  (  =  [ Ndy/Jx\  )  or  by  s+1  (  = 
[N  dy/dx J  + 1 )  for  each  increment  of  x.  This  can  easily  be  seen  by  examining  all  possibilities  for  the 
expression  (round((</y/<£v)(jr.+ AO)  -  round(((/;/Jjc)jc().  There  arc  four  possibilities  for  the 
expression  depending  upon  whether  the  subexpressions  get  rounded  up  or  down.  In  all  eases  except 
one  the  value  of  the  expression  is  s.  In  the  case  when  the  the  first  subexpression  gets  rounded  up  and 
the  second  subexpression  gels  rounded  down  the  value  of  the  expression  is  s+1.  The  incremental 
algorithm  can  be  stated  as  the  following. 


LINE  DRAWING 


69 


yi  • =  ® • 

ei  :=  N*  ( dy/dx) ; 

for  x.  :=  0  to  dx  by  N  do  begin 
disp1ay(x  ,y .); 
if  (ei  >  (s+1/2))  then  begin 
y .  :  =  y .  +  s  +  1 ; 
e’  :=  e’  +  N*(dy/dx)  -  1; 
end 

else  begin 

y .  :  =  y .  +  s ; 
e]  :=  e^  +  N*(dy/dx); 
end 
end 


Substituting  r  =  2  (e.  -  s  -  1/2 )dx  transforms  this  algorithm  into  the  following. 

Yi  :■  0; 

r  :=  2*N*dy  -  2*s*dx  -  dx; 
for  x^  :=  0  to  dx  by  N  do  begin 
display(x  y  ); 
if  (r  >  0)  tnen  begin 

y  .  -  y  +  §  +  1  • 

r1  :  =  r  +  (2*N*dy’  -  2*(s+l)*dx) ; 
end 

else  begin 

Vi  :■  yi  +  s; 

r  :=  r  +  (2*N*dy  -  2*s*dx); 
end 
end  . 


This  algorithm  generates  every  Mil  point  of  a  given  line.  It  can  be  used  to  draw  the  line  if  the 
appropriate  stroke  is  displayed  at  every  point  generated.  The  stroke  displayed  depends  upon  the 
vertical  displacement  to  the  next  point  along  the  line,  which  wc  know  to  be  cither  s  or  s+ 1.  The  line 
can  now  be  drawn  using  the  following  algorithm.  Also  replaced  is  a  precomputed  value  a  :=  2Ndy 
—  2s  dx. 

y<  :=  o; 

r  :  =  a  -  dx ; 

for  x^  :=  0  to  dx  by  N  do  begin 
if  (r  >  0)  then  begin 

Display  Stroke  of  height  s+1  at  ( x 1 ,y i ) ; 
y.  y1  +  s  +  1; 
r  :=  r  +  a  -  2*dx; 
end 

else  begin 

Display  Stroke  of  height  s  at 


70 


Although  the  values  of  5  and  a  can  be  computed  using  multiplications  and  divisions,  die  following 

incremental  algorithm  computes  their  values  in  a  manner  similar  to  the  Bresenham's  algorithm.  ’Ibis 

can  be  hence  used  as  a  prologue  to  the  line  drawing  algorithm  above. 

s  :  =  0  ; 
a  :  =  0; 
r  :=  dy  -  dx; 

for  i  :=  0  to  (N-l)  do  begin 
a  :=  a  +  2*dy; 
if  (r  >  0)  then  begin 

s  :=  s  +  1; 

a  :=  a  -  2*dx; 

r  :=  r  -  (dx  -  dy); 

end 

else  r  :=  r  +  dy; 
end 

Although  this  line  drawing  technique  is  faster  than  the  single-bit  algorithms,  especially  for 
architectures  that  allow  parallel  updates,  it  docs  not  produce  optimal  lines.  The  N  spaced  points  that 
it  generates  are  optimal  and  hence  the  origin  of  each  stroke  can  have  a  maximum  error  of  only  1/2. 
However,  there  can  be  another  error  of  1/2  within  a  stroke  which  could  be  in  the  same  direction  as 
the  stroke  positioning  error.  However,  the  resulting  total  error  at  a  point  within  a  stroke  can  never 
add  up  to  exactly  1  although  it  can  be  arbitrarily  close  to  it.  A  proof  to  tins  effect  is  provided  in 
Appendix  A  to  this  chapter.  An  example  of  an  error  of  0.9  for  a  line  drawn  using  strokes  is  shown  in 
Figure  5-9.  It  sl  ows  that  for  a  line  with  dx  =  100  and  dy  =  85,  the  value  of  s  =  6,  and  the  error  can 
be  observed  at  >  =  66.  Table  5-1  tabulates  the  maximum  error  for  all  possible  lines  with  dx  <  128  for 
different  values  of  N. 

Although  this  line  drawing  technique  produces  non-optimal  lines,  the  end-point  of  the  line  is 
always  exact.  This  is  due  to  the  fact  that  the  maximum  error  at  any  point  can  never  be  greater  than  or 
equal  to  one.  Hence  any  point  that  will  lie  exactly  on  the  line  (the  end-point  is  one  such  point)  will  be 
computed  correctly  by  this  algorithm  because  an  incorrect  computation  would  imply  an  error  greater 
than  or  equal  to  one.  Exhaustive  simulations  of  die  algorithm  also  verify  that  the  end-points  arc 
always  correct  and  all  lines  also  satisfy  all  other  acceptability  criteria. 

Figure  5-10  shows  two  lines,  the  first  of  which  is  the  optimal  line  drawn  using  the  single-bit 
algorithm;  the  second  is  drawn  using  the  A-step  algorithm  where  /V=  8.  The  second  line  has  a  wavy 
behaviour  because  of  the  interaction  between  the  slope  of  the  line  and  the  si/.e  of  die  stroke.  Ibis 
behaviour  is  die  main  reason  for  die  unacceptability  of  Uiis  line  drawing  algorithm.  The  Fourier 
Transform  measure  for  these  lines  shows  undesirable  peaks  at  the  beat  frequencies.  This  can  be  seen 


72 


plot  ;md  wc  can  see  peaks  at  certain  frequencies,  which  correspond  to  the  undesirable  ripples  in  die 
lines.  Later  in  this  chapter  v.c  shall  discuss  other  stroke  drawing  algorithms  which  use  more  strokes 
and  reduce  the  undesirable  ripple  effect. 


Figure  5-10:  The  first  line  uses  a  single-bit  algorithm, 
die  second  line  uses  die  A'-step  algoridun  with  N= 8. 


5.3.2.  The  (N  -  1)step  algorithm 

Since  die  end-points  of  an  N  point  stroke  arc  spaced  N- 1  points  apart,  we  could  compute  points 
along  die  line  which  would  correspond  to  die  actual  end-points  of  the  stroke  and  join  these  points 
with  die  appropriate  stroke.  This  would  involve  computing  die  locations  of  die  xQ  and  x #_j  as  the 
end-points  of  die  first  stroke,  the  locations  of  *A,  and  x1N_  j  for  the  location  of  the  second  stroke  and 
so  on.  Notice  dial  a  line  draw  ing  algorithm  of  this  variety  would  be  computing  twice  as  many  points 
as  die  A'-step  algorithm.  The  (A'-l)-stcp  algoridim  uses  A'  strokes  (Figure  5-12)  as  opposed  to  the 
A'+ 1  used  by  the  A'-step  algorithm. 

The  following  obvious  algoridim  can  be  used. 


LINE  DRAWING 


D.C.  value  -  80.28 


Figure  5-11:  Sum  of  Fourier  Transforms  for  the  iV-step  algorithm  for  /V=8. 


X  «77 
•X  3/7 


•X  2/7 

•  • 


•X  1/7 


X  77 
X  67 
X  37 


......  x  377 

Figure  5-12:  Strokes  used  by  the  (/V  —  l)-stcp  algorithm  for  /V=8. 


for  xi  :=  0  to  dx  by  N  do  begin 
yi  :=  round(  (dy/dx)  *  x.); 
yt+N.;  ==  round(  (dy/dx)  *  (xi+„_j)); 
Display  stroke  of  height  (y i+M_1  -  y^ 
at  (x.,y.); 
end 


If  s  is  defined  to  be  [  (N—  1  )dy/dx  J,  then  by  the  same  argument  used  for  the  /V-stcp  algorithm,  the 
only  two  strokes  used  by  any  given  line  will  have  a  height  of  either  5  or  s+1.  An  equivalent 
incremental  algorithm  can  then  be  stated  as  the  following, 

y,  :=  0; 

ei  :=  ( N-l ) *dy/ dx ; 
for  x.  :=  0  to  dx  by  N  do  begin 
if  (e.j  >  (s  +  1/2))  then  begin 

Display  stroke  of  height  (s+1)  at  ( x ^ . y i ) ; 

y.  :=  y.  +  s  +  1; 

ei  :=  e.  +  (dy/dx)  -  s  -  1; 

end 

else  begin 

Display  stroke  of  height  s  at  (x.,y.,); 

yt  :=  yi  +  s= 

ei  :=  ei  +  (dy/dx)  -  s; 
end 

if  (ei  >  1/2)  then  begin 

y  .  ;  =  y  .  +  1 ; 

e’  :=  e’  +  (N-l)*dy/dx  -  1; 
end 

el  se 

e{  :=  ej  +  (N-l)*dy/dx; 
end 


This  can  be  transformed  into  the  following  integer  algorithm,  where  a  =  (2;V- 1  )dx  -  2sdx.  Hie 
values  of  s  and  a  can  be  computed  using  a  prologue  similar  to  the  one  in  the  A'-step  algorithm. 


I  INK  DRAWING 


75 


y1  :=  0; 
r  :=  a  -  dx ; 

for  x.  :=  0  to  dy  by  N  do  begin 
if  (r  >  0)  then  begin 

Display  stroke  of  height  (s+1)  at  ( x 1 , y i ) : 
y.  :=  Yi  +  s  +  1; 
r  :=  r  +  2*dy  -  2*dx; 
end 

else  begin 

Display  stroke  of  height  s  at  ( x i , y 1 ) ; 
y,  :■  y,  +  s; 
r  :=  r  +  2*dy; 
end 

if  (r  >  0)  then  begin 

yi  :=  h  + 1: 

r  :  =  r  +  a  -  2*dx ; 
end 

else 

r  :=  r  +  a; 

end 


The  (N-  l)-step  algorithm  can  result  in  some  point  with  an  error  of  1  in  the  case  when  N  is  odd.  A 
proof  for  this  is  contained  in  Appendix  B.  In  this  case  die  algorithm  will  generates  some  lines  with 
incorrect  end-points  which  makes  the  algorithm  unacceptable  for  odd  values  of  N.  Table  5-2 
tabulates  the  maximum  error  for  all  possible  lines  with  dx  <  128  for  different  values  of  N. 

N  Maximum  error 


Table  5-2:  Maximum  vertical  error  for  different  values  of  N. 


For  even  values  of  /V  this  algorithm  produces  Slightly  better  lines  than  the  A'-stcp  algorithm.  TTiis 
can  be  observed  in  the  lines  in  Figure  5-13.  ’Ihe  better  appearance  is  verified  by  the  sum  of  the 
Fourier  transforms  for  all  possible  lines  with  dx  =  12S.  Figure  5- 14  plots  the  Fourier  transform  and  a 
comparison 'with  Figure  5-11  shows  that  this  algorithm  results  in  slightly  lower  extra  ripple 
frequencies. 


Figure  5-13:  The  first  line  uses  the  single-bit  algorithm; 
the  second  uses  the  (vV-  l)-stcp  algorithm  for  A'=8. 


f  igure  5-14:  Sum  ofl-ouricr Transforms  the  (.V-  l)-sicp  algorithm 


I  INK  DRAWING 


77 


5.4.  Better  lines  using  a  larger  number  of  strokes 

One  way  to  explain  the  results  of  the  two-stroke  algorithms  is  to  observe  that  only  a  single  stroke  is 
displayed  for  cadi  rise  in  y,  while  the  optimum  line  uses  one  of  several  strokes  with  the  same  rise. 
Figure  5-15  shows  three  of  the  eight  different  strokes  possible  for  a  stroke  size  of  eight  and  rise  of  one 
for  a  horizontal  jump  of  eight.  The  stroke  used  for  the  optimum  line  depends  upon  die  slope  of  die 
line  and  die  actual  offset  of  the  line  at  the  origin  of  die  stroke.  For  the  three  strokes  shown  in  Figure 
5-15,  die  first  stroke  would  be  optimal  for  a  line  with  a  slope  of  1/8  and  an  offset  of  0  at  die  beginning 
of  a  stroke.  The  second  stroke  would  be  optimal  for  a  slope  of  1/8  and  an  offset  of  1/8,  the  third 
stroke  for  a  slope  of  1/ 10  and  an  offset  of  0. 

slope  =  1/8  •  •  o  •  X 

offset  =  0  eooo 

slope  =  1/8  9  9  0  9  •X 

offset  =  1/8  0  9  0 

slope  =1/10  9  0  .X 

Offset  =  0  0  9  9  9  9 

Figure  5-15:  Three  different  strokes  having  die  same  rise. 

Hi  is  suggests  a  possible  method  for  improving  the  lines  produced  by  the  stroke  drawing 
algorithms.  We  could  compute  y  to  a  higher  precision  at  the  N  spaced  points,  and  then  use  die  sub- 
pixel  offset  and  the  higher-precision  value  of  the  slope  available  to  choose  from  a  larger  set  of  strokes. 
As  an  example,  one  extra  bit  of  precision  would  provide  us  with  half-pixel  y  position  for  every  x 
spaced  N  pixels  apart  The  ;V-stcp  algorithm  that  would  provide  i  extra  bits  of  precision  would  be  die 
following  where  j  =  2\  s  =  \jNdy/dx\ ,  and  a  =  IjNdx  -  2 jsdx.  A  prologue  similar  to  the 
prologue  in  the  A'-step  algorithm  could  be  used  to  compute  these  values. 


Yi  s-  o; 

r  :=  a  -  j*dx; 

fop  x.  :=  0  to  dx  by  N  do  begin 
if  (r  >  0)  then  begin 

Display  Stroke  of  height  s+1  at  ( x < , y ^ ) ; 
y.  :=  y.  +  s  +  1; 
r  :=  r  +  a  -  2*j*dx; 
end 

else  begin 

Display  Stroke  of  height  s  at  ( xi ,y i ) ; 

y<  :=  y,  +  s: 

r  :  =  r  +  a; 
end 

end 


In  tliis  algorithm  both  y  and  sarc  represented  with  an  integer  part  and  an  /  bit  fractional  part,  'ITic 
fractional  part  of  y(  is  the  value  of  the  subpixel  positioning  at  die  origin  of  die  stroke  and  the 
fractional  part  of  s  adds  precision  to  die  rise  of  the  stroke.  The  situation  is  illustrated  in  Figure  5-16 
for  N  =  8  and  /  =  2.  The  stroke  can  originate  at  eidicr  of  the  4  subpixcl  origins  and  can  have  a  rise  of 
0  <  s  <  32  in  die  horizontal  span  of  8  pixels.  The  line  needs  a  total  of  132  different  strokes.  In  general 
an  N- s.cp  line  drawing  algorithm  using  i  extra  bits  of  precision  would  require  a  total  (;V2'+1)2' 
strokes.  Some  of  these  strokes  arc  going  to  be  die  same  and  a  method  to  reduce  the  storage 
requirements  will  be  die  subject  of  discussion  in  die  next  section.  Since  die  origin  of  the  stroke  will  be 
the  rounded  value  of  y\  (for  example  in  the  example  above  if  the  value  of  die  low-cr  of  bits  of  yj  is  2 
then  the  first  pixel  of  die  stroke  will  be  located  at  y/4+1),  we  can  modify  die  algoridim  to  add  a 
constan’  to  the  initial  value  of  y.  and  then  merely  use  the  integer  part  of  y.  as  die  origin  of  the  stroke. 
The  modified  algoridim  is  the  following. 

y<  j/2; 

r  :=  a  -  j*dx; 

for  x 1  :=  0  to  dx  by  N  do  begin 
If  (r  >  0)  then  begin 

Display  Stroke  of  height  s+1,  offset  y Xj 
at  (x1.yi/j); 
y1  :=  y<  +  s  +  1; 
r  :»  r  +  a  -  2*j*dx; 
end 

else  begin 

Display  Stroke  of  height  s,  offset  yi%; 

at  (x^y/j): 
y<  :•  Yi  +  s; 
r  :  =  r  +  a; 
end 
end 


Figure  5-17  shows  die  strokes  used  when  N  =  8  and  /  =  1.  ITic  use  of  extra  precision  in  the 


r 


r 

n 


f 

to 


r 


r 


i 


» 


» 


* 


» 


UNH  DRAWING 


79 


Figure  5-16:  Sub-pixel  positioning  and  extra  precision  rise. 

positioning  or  the  strokes  helps  decrease  tire  maximum  error  in  tire  lines  produced.  Since  die 
individual  strokes  are  placed  to  a  precision  of  \/j  of  a  pixel,  tire  stroke  placement  error  can  be  a 
maximum  of  1/(2 J).  Within  die  stroke  each  pixel  can  be  also  placed  to  a  precision  of  1  /j  and  will  dicn 
be  rounded  to  the  nearest  pixel.  The  pixel  placement  produces  a  maximum  error  of  1/(2;)  and  the 
rounding  produces  an  error  of  1/2.  The  total  maximum  error  produced  by  using  an  algorithm  with  / 
extra  bits  of  precision  is  hence  1/2  +  1/ j.  This  agrees  with  die  simulations  shown  in  Table  5-3  which 
tabulates  die  maximum  error  in  the  At- step  algoridim  against  different  values  of  i.  Notice  that 
increasing  the  precision  would  only  make  the  lines  asymptotically  approach  the  optimum. 

A  better  demonstration  of  an  improvement  in  the  appearance  of  the  lines  can  be  seen  in  Figure  5- 
18  which  shows  a  line  drawn  '..sing  different  amounts  of  precision.  Hie  sum  of  Fourier  Transforms 
confirms  die  better  appearance  of  the  lines  and  is  shown  in  Figure  5-19. 

All  the  results  of  this  section  are  also  valid  for  the  (,V-l)-stcp  algorithm.  The  (.V-l)-stcp 
algorithm  always  produces  slightly  better  lines  than  die  A7-stcp  algorithm  for  a  given  amount  of 
precision.  Hut  using  higher  precision  is  a  better  alternative  than  to  use  two  steps  in  the  inner  loop  as 
used  by  die  (A  -  l)-step  algorithm.  M>  choice  for  the  line  draw in^  Jgoridim  to  he  used  would  be  the 
A'-sicp  algorithm  witli  2  extra  bits  of  precision.  Hence  for  A’  =  8.  we  would  require  a  total  of  132 
strokes.  Appendix  C  contains  this  complete  algorithm  including  die  stroke  precomputing. 


o  8 


9 

a  7.5 

o 

□  7.5 

0 

13 

Q 

9 

o 

a 

© 

B 

0  7 

• 

a 

9  7 

o 

13 

9 

© 

a 

O 

• 

a 

0 

« 

U 

9 

© 

a  0.5 

a 

0 

B  6.5 

a 

o 

9 

n 

□ 

9 

□ 

a 

a 

9 

a 

9 

a 

o  6 

© 

a 

© 

o  6 

o 

o 

n 

© 

• 

0 

n 

• 

□ 

9 

• 

a  5.5 

B 

• 

n  5.5 

a 

a 

o 

□ 

n 

9 

© 

a 

0 

□ 

o 

9 

a 

□ 

© 

9 

□ 

o  5 

0 

a 

© 

o  5 

a 

0 

o 

Si 

a 

0 

• 

a 

a 

o 

a 

o 

a  4.5 

o 

o 

a 

□  4.5 

© 

© 

a 

B 

e 

© 

□ 

B 

• 

□ 

a 

•  4 

Q 

□ 

e 

0  4 

B 

a 

• 

« 

a 

a 

© 

• 

a 

a 

n 

• 

o 

q  3.5 

o  ©  a  n  3.5 

•  or  a 

a  a  e  3 

■  n  •  •  • 

•  © 

•  ©  b  a  b  2.5 

ana 

a  a  o  •  2 

•  oo© 

©  © 

a  b  a  a  a  1.5 

a  a  a 

#  •  •  •  1 

•  •  •  © 

aaaannuoO.5 

•  ••©••ooO 


©  e  a  a 

a  a  3 

BOD  O  0 

•  •  © 

•  ’  ©  ©  □  2.5 

ana 

■  a  a  a 

•  «  e  ©  2 

0  0  9  9 

a  a  1.5 

a  □  a  a  a  a 

•  ••©©•ool-O 

BHBaauEiaO.5 

•  O99©«e©0 


Offset  =  0  Offset  =  0.5 

Rise  is  marked  next  to  the  stroke. 


Figure  5- 1 7;  Strokes  used  in  the  A'-stcp  algorithm  for  ,V=S,  i=  1. 


MM  DRAWING 


SI 


i 


Maximum  error 


0 

1 

2 

3 

4 

5 


0.992 

0.992 

0.742 

0.617 

0.531 

0.516 


Table  5-3:  Maximum  vertical  error  for  different  values  of  /. 


Figure  5-18:  Line  drawn  with  different  amounts  of  extra  precision. 


5.4.1 .  Optimal  lines  using  strokes 

None  of  die  stroke  drawing  algorithms  we  have  discussed  so  far  produce  the  optimal  lines.  ITtc 
technique  discussed  in  the  previous  section  only  approaches  the  optimum  asymptotically  but  cannot 
achieve  die  optimum  irrespective  of  the  amount  of  precision  used.  'ITiis  should  not  concern  us  to  a 


D  .  C  . 


Figure  5*19:  S 


large  extent  since  or 
the  optimum  lines  ( 


I  IN!-:  DRAWING 


83 


only  a  theoretical  study  of  how  one  can  produce  die  optimum  lines.  In  practice  this  technique  will 
probably  never  be  used. 

The  first  question  to  ask  is  that  if  two  strokes  per  line  do  not  produce  the  optimum  line,  then  what 
is  the  maximum  number  of  strokes  that  a  line  might  have  to  use  in  order  to  make  it  optimum.  In 
order  to  determine  this  let  us  consider  a  line  in  the  first  octant  with  a  rise  of  ily  in  dx.  This  line  will 
illuminate  a  certain  set  of  pixels  in  tire  first  A'  columns.  Let  this  set  define  stroke  number  0 
(,S7roM0][0..;V- !])•  Now  imagine  moving  tire  line  up  slowly,  or  in  other  words  imagine  the  line  y  = 
(dy/dx) x  +  c  with  the  value  of  c  being  slowly  increased  from  0  to  1.  .Y/ra/o{0]  is  the  set  of  pixels 
illuminated  when  c  =  0.  A  parallel  set  of  pixels  will  be  illuminated  for  c  =  1,  with  the  difference 
being  that  each  point  in  this  set  will  be  one  greater  than  in  .S7/o/o|0].  Hence  for  die  increase  of  the 
value  of  c  from  0  to  1.  the  .V  pixels  in  .$7raAc[0]  have  all  jumped  up  by  one.  This  set  of  pixels  will 
actually  jump  up  one  at  a  time  as  c  is  increased  gradually  from  0  to  1.  This  technique  gives  us  /V 
strokes  and  these  can  be  used  to  generate  the  optimum  line  because  these  strokes  represent  all 
possibilities  when  a  line  of  this  slope  originates  at  a  non-integer  pixel  boundary.  These  strokes  can  be 
stored  together  with  the  values  of  c  that  each  stroke  was  created  with.  The  optimum  line  can  now  be 
drawn  by  computing  every  A'tli  pixel  along  the  line  together  with  the  offset  from  the  pixel  below, 
lookup  the  closest  value  of  tire  offset  in  the  array  of  strokes  and  then  use  that  stroke  at  that  location. 
The  r  values  of  die  Bresenham's  algoridim  provide  a  measure  of  the  offset  and  can  be  used  instead  of 
die  fractional  values  of  c.  This  line  drawing  algorithm  follows. 


84 


{  Generating  the  line  using  8  strokes  } 

{  This  computes  s  =  f loor(  16*dy/dx)  and  a=  16*dy-2'fs*dx  } 
s  :=  0;  a  :=  0;  r  :=  dy-dx; 
for  i  :=  1  to  8  do  begin 
if  ( r  >  0)  begin 

a  :=  a+2*dy-2*dx; 
s  :=  s+1; 
r  :=  r+dy-dx; 
end 

else  begin 

a  :=  a+2*dy; 
r  :=  r+dy; 
end 

end ; 

{  This  computes  the  base  stroke,  which  is  the  first  eight 
pixels  of  the  line  and  the  error  values  associated  with 
each  of  the  pixels  } 
err[l]  :=  2*dy-dx;  line[0][0]  :=  0; 
for  i  :=  1  to  7  do  begin 
if  (err[i]  >  0)  begin  • 

line[0][i]  :=  1 ine[0][ i-l]+l ; 
err[i+l]  :=  err[i ]+2*dy-2*dx; 
end 

else  begin 

line[0][i]  :=  1 i ne[0][ i-1] ; 
err[i+l]  :=  err[i]+2*dy; 

end 

end 

« 

{  This  computes  the  other  eight  lines  and  the  displacement 
from  the  base  line  associated  with  each  of  them.  } 
disp[0]  :=  a-2*dx;  disp[9]  :=  2*dxmax; 
for  i  :=  1  to  8  do  begin 
rmax  :=  -2*dxmax; 
for  j  :=  0  to  7  do  begin 

line[i][j]  :=  line[i-l][j]; 
if  (err[j+l]  >  rmax)  begin 

rmax  :=  err[j+lj;  jmax  :=  j; 
end 
end 

line[i][ jmax]  :=  1 ine[ i ][ jmax]+l; 
disp[i]  :=  2*dy-err[ jmax  +  l]+a-2*dx ; 
err[jmax+l]  :=  -2*dxmax; 
end 

{  This  is  the  main  loop  which  uses  the  9  strokes  to  draw 
the  line.  } 
r  : =  a-2*dx;  y  : =  0; 
for  i  :=  0  to  dx/8  do  begin 
j  :=  0; 

while  (disp[ j]  <  r)  j  :*  j+1; 
jout  :=  j - 1 ; 


1  INI- DRAWING 


35 


for  j  : =  0  to  7  do  begin 

array[ i *8+j]  :=  1  ine[ jout][j]+y ; 
if  (r  >  0)  begin 
y  :=  y+s+1; 
r  :=  r+a-2*dx; 
end 

else  begin 
y  :=  y+s; 
r  :=  r+a; 
end 
end 
end 

The  inner  loop  of  this  algorithm  searches  for  the  stroke  to  use  and  then  uses  that  stroke.  This 
algorithm  is  obviously  inefficient  for  all  lines  with  lengths  less  than  A'2,  because  the  N  strokes  that 
have  to  be  computed  contain  N~  pixels.  Even  for  extremely  long  lines  the  inner  loop  would  have  to 
done  extremely  efficiently  to  make  this  algorithm  worthwhile.  Conventional  processor 
implementations  would  require  a  longer  time  to  search  for  the  stroke  than  to  write  it  into  memory, 
especially  if  the  memory  allowed  the  updating  of  such  a  stroke  in  parallel. 

5.5.  Total  number  of  strokes 

We  have  studied  several  algorithms  that  draw  lines  using  ;V-pixel  strokes.  These  algorithms  use 
different  numbers  of  strokes  to  draw  a  line  and  hence  require  a  different  set  of  strokes  to  choose 
from.  But  for  a  given  stroke  size  there  is  a  maximum  number  of  strokes  that  ever  get  used  and  all 
possible  lines  can  be  drawn  using  these  strokes.  Although  at  first  sight  it  may  seem  that  for  A'-pixel 
strokes  with  a  rise  of  s  there  should  be  N\/(s\  A'!)  different  possible  strokes,  this  may  not  be  the  case 
because  the  jumps  have  to  be  spaced  out  evenly  in  order  for  the  stroke  to  be  part  of  a  line.  Hence,  for 
example,  a  stroke  that  jumps  four  pixels  in  the  first  four  pixels  of  an  eight  pixel  stroke  and  then  stays 
horizontal  would  never  be  used.  Table  5-4  contains  all  the  possible  strokes  for  N  =  8.  Table  5-5 
tabulates  the  total  number  of  strokes  required  for  different  stroke  sizes. 

None  of  the  algorithms  we  have  discussed  can  choose  the  optimum  stroke  from  this  set.  Some  of 
the  algorithms  choose  from  a  smaller  set  and  some  choose  from  a  larger  set.  1  n  either  case  the  set  used 
is  a  subset  of  the  total  number  of  strokes  possible.  In  the  case  when  the  algorithm  chooses  from  a 
larger  set  of  strokes,  the  set  will  contain  non-unique  strokes.  A  simple  way  of  avoiding  the 
redundancy  is  to  provide  a  table-lookup  with  poimers  to  one  of  the  total  set  of  strokes. 


00000000 
00000  1  1  1 
00111111 
00111122 
01112222 
0000  1  1  12 
0001 1222 
01112233 
01122333 
01122334 
01223344 
00122344 
01123445 
01223345 
00123345 
01 123455 
01234456 
00123456 


00000001 
0000  1  1  1  1 
01111111 
00111222 
01111112 
0001 1112 
00112223 
01 122233 
0001 1223 
01122344 
00112234 
00122334 
01223445 
01233455 
00123445 
01223456 
01234556 
01123456 


00000011 
00011111 
00111112 
01111222 
01111122 
0001 1 122 
00112233 
01112223 
00111223 
01 123344 
00112334 
01123345 
01223455 
00122345 
00123455 
01233456 
01234566 
01234567 


Number  of  strokes  =  54 


Table  5-4:  AH  possible  strokes  for  N  =  8. 

N  Number  of  strokes 


2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 


2 

3 

8 

14 

24 

36 

54 

76 

104 

136 

178 


Table  5-5:  Number  of  all  possible  strokes  for  different  values  of  N. 


LJNi:  DRAWING 


87 


Appendix  A 

'This  appendix  proves  that  the  lines  produced  by  the  A'-step  algorithm  always  have  a  vertical  error 
of  less  than  1,  and  hence  the  algorithm  always  generates  lines  with  matching  end-points.  The  proof  is 
adapted  from  a  similar  proof  by  Mike  Sprcitzcr  of  CalTech.  This  proof  shall  also  assume  that  the  line 
is  drawn  from  (0,0)  to  (dx.Jy)  and  that  dx  >  ily  >  0.  We  shall  use  z  as  the  measure  of  vertical  error  for 
the  line;  z  is  the  signed  distance  from  die  pixel  at  *  =  /  to  the  line.  This  distance  can  be  computed  as 
y.  —  i  dy/ dx. 

The  N-stcp  algorithm  computes  every  Atli  point  of  the  lines  and  guarantees  that  |evJ  <  1/2.  These 
points  are  joined  with  one  of  two  strokes  that  represent  a  line  of  slope  s/N,  where  s  is  the  vertical 
distance  from  the  origin  of  the  stroke  to  the  origin  of  the  next  stroke  ( =  y^.+  v  -  y v).  If  tire  stroke  is 
generated  using  the  Brcsenham  algorithm  aiming  at  a  siopc  of  s/N,  then  we  have  |zy,  — 
(/AV)(zv/+  v  —  r  v  )|  <  1/2,  for  0  <j<  N.  We  consider  two  cases:  for  when  the  expresson  is  positive 
and  negative. 

1.  When  die  expression  is  positive  we  have  ?Nj+J  <  (// A0(zVh  v  -  z/V()  +  1/2.  From  die  triangle 
inequality,  we  know  that  |?V/.+  v  —  z Y(|  <  1/2.  However,  the  equality  case  nevers  occurs  because  if  it 
did.  dicn  the  slope  of  the  line  would  an  integer  multiple  of  1/A'  and  die  zAV  would  be  zero  for  all  /.  So 
die  triangle  inequality  can  be  restated  as  N  ~  ?V;|  <  1/2.  Fur  1  <j  <  N/2,  this  yields  zv/  <  1. 

2.  The  negative  case,  by  similar  arguments  yields  that  for  1  <j<  N/2,  Z(V/+J  >  —  1. 

Both  dicse  arguments  together  result  in  \*Ni+J  <  1  for  1  <j  <  /V/2. 

Approaching  from  the  other  side  (i.c.  ;t  =  A 7  -  j),  we  can  get  that  Uv,_yl  <  1  for  1  <j  <  N/2. 
Combining  die  two  results  we  have  U/V/+yl  <1  for  1  <  j  <  N  w lien  <V  is  even. 

This  proves  the  fact  that  die  maximum  vertical  error  at  any  point  can  never  equal  or  exceed  1.  'This 
also  proves  the  fact  dial  the  end-point  shall  be  computed  correctly  because  if  die  end-point  was 
computed  incorrectly  then  the  error  at  die  end-point  would  be  equal  to  or  greater  than  1. 

Appendix  B 

In  die  A’-l  step  algorithm  the  condition  for  the  vertical  error  at  any  point  along  die  line  gets 
restated  as  |c/V(  —  j/(N-  1)(-(V/  +  ;V_  [  ~  ’^jli  <  1/2.  When  N  is  odd.  it  is  the  step  of  combining 

\zNi+)  <  1  ^or  1  5/  ^  *V— 1/2.  and  |ry. _J  <  1  for  1  <j<  N—  1/2.  that  docs  guarantee  l-V(+yl  <  1  for 


88 


j  -  ( N+ 1)/2.  So  if  N  =  7,  then  the  N- 1  step  algorithm  docs  not  guarantee  that  tine  error  will  be  less 
than  1  at  die  mid-point  of  the  stroke,  that  is  when  j  =  4. 

Figure  5-20  shows  an  example  for  N  =  3  for  a  line  with  dx  =  10  and  dy  =  5. 


Shallow  Stroke 


Steep  Stroke 


Figure  5-20:  Frror  of  1  made  by  the  ( N-  l)-stcp  algoridim 
in  a  line  with  dx=  10,  dy=  5,  and  N=  3. 

Appendix  C 

This  is  die  recommended  line  drawing  algorithm  for  S-pixcI  strokes.  'ITte  first  algorithm  presented 

precomputes  the  necessary  strokes.  The  algorithm  presented  uses  multiplications,  which  does  not 

matter  at  precomputing  time,  although  the  obvious  incremental  algoridim  can  be  used. 

{  Precomputing  strokes  } 
for  height  :=  0  to  32  do 
for  offset  : =  0  to  3  do 
for  x  :=  0  to  7  do 

Stroke  (height, off  set)  : =  (offset  +  height*x/8)4; 


The  following  algoridim  uses  the  precomputed  strokes  to  draw  the  line. 


I  INI:  DRAWING 


89 


{  Line  drawing  algorithm  } 

{  Prologue  } 

s  :  =  0 ;  a  :  =  0 ; 

r  :=  4*dy  -  dx; 

for  i  :=  0  to  7  do  begin 

a  :=  a  +  8* dy ; 
if  (r  >  0)  then  begin 
s  :  =  s  +  1; 
a  : =  a  -  8*dx ; 
r  :=  r  +  4*dy  -  dx; 
end 

else  r  :=  r  +  4  *  dy ; 

end ; 

{  Line  drawing  } 

*1  :=  2: 

r  ;=  a  -  4*dx; 

for  x.  :=  0  to  dx  by  8  do  begin 
if  (r  >  0)  then  begin 

Display  Stroke  (s+l,y.%4)  at  (x.,yi/4); 
yi  :=  y.  +  s  +  1; 
r  : =  r  +  a  -  8*dx ; 
end 

else  begin 

Display  Stroke  (s.yTM)  at  (xi,y./4); 

h  :=  h  +  s; 
r  r  +  a; 

end; 

end ; 


Appendix  D 

For  lines  drawn  from  (0,0)  to  (dx.dy)  such  that  0  <  dy  <  dx,  the  Digital  Differential  Analyzer 

(DDA)  algorithm  draws  the  line  in  the  following  manner. 

yi  :=  0;  r  :=  (dy/dx)/2; 
for  x.  :  =  0  to  dx  do  begin 
display(xi,yi); 
if  (r  >  1/2)  then  begin 

:=  h  + 

r  :=  r  +  dy/dx  -  1; 
end 

else  begin 

r  :=  r  +  dy/dx; 
end ; 

end ; 


Implementations  of  the  DDA  represent  dy/dx  as  an  integer  with  a  fixed  amount  of  precision. 
Because  this  rational  number  may  be  a  repealed  fraction,  this  representation  can  never  represent 
some  fractions  w  ithout  incorporating  an  error.  The  error  can  cause  the  diagonal  or  horizontal  move 


90 


decision  to  take  the  w  rong  direction  and  result  in  an  incorrect  line.  An  example  is  when  tlx  =  12,  and 
dy  -  2.  The  fraction  2/12  cannot  be  represented  as  an  integer  (because  it  is  a  repeated  fraction  of  the 
form  0.0010101010101010.. ..).  Depending  upon  the  amount  of  precision  used  the  fraction  will  be 
either  rounded  up  or  down.  In  die  ease  when  it  is  rounded  down,  the  value  of;'(  at  .v.  =  9  w  ill  be  less 
than  the  actual  1.5,  and  die  DDA  will  hence  illuminate  die  pixel  (9,1)  which  is  incorrect.  Rounding 
up  can  similarly  cause  a  higher  pixel  to  be  illuminated.  No  amount  of  precision  will  generate  correct 
lines,  although  die  maximum  error  will  indeed  decrease  w  ith  increasing  amounts  of  precision. 


FILTERING 


91 


Chapter  8 
Filtering 


Bit-map  images  show  annoying  visual  effects  in  the  form  of  "jaggies"  or  "staircases"  when  a 
shaded  region  jumps  between  rows  or  columns  of  pixels.  On  a  display  dev  ice  that  can  present  gray¬ 
scale  images,  this  effect  can  be  avoided  by  smoothing  the  jump  using  gray  intensity  values  between 
white  and  black. 

The  jagged  effect  is  due  to  the  sampling  of  a  sharp  edge  over  a  fixed  grid.  The  original  edge 
contains  frequencies  higher  than  those  that  can  be  faithfully  reproduced  by  die  samples  (Nvquist's 
theorem).  This  results  in  higher  frequencies  aliasing  as  lower  ones;  the  error  is  hence  known  as 
aliasing.  Figure  6-1  shows  a  high  and  low  frequency  signal  both  of  which  result  in  die  same  samples. 
Low-  pass  filtering  of  the  signal  before  sampling  can  remove  the  aliasing  problem;  this  preprocessing 
is  hence  called  anti-aliasing.  Figure  6-2  illustrates  this  situation  in  one  dimension.  The  first  part  shows 
the  sampling  of  the  unfiltcrcd  signal  and  die  second  part  shows  the  sampling  of  the  filtered  signal.  As 
illustrated  by  the  figure,  the  filtered  samples  require  die  ability  to  display  gray  intensity  values.  For 
the  proper  appearance  of  the  resulting  image  the  filter  has  to  be  properly  chosen,  else  the  images 
produced  tend  to  show  other  annoying  aliasing  artifacts.  In  general,  the  filter  should  be  a  parameter 
of  die  anti-aliasing  computations  and  could  be  varied  until  an  acceptable  image  results. 


1 — x- 


Figurc  6-1;  The  aliasing  problem. 


Hie  filtering  of  the  image  with  a  low-pass  filter  is  equivalent  to  an  averaging  process:  the  intensity 


Unfiltered  signal 


Filtered  signal 


Figure  6-2:  Sampling  an  unfilierod  and  a  filtered  signal. 

of  a  sample  is  determined  by  the  image  brightness  in  the  near  vicinity  of  the  pixel.  The  averaging 
function  is  the  space-domain  equivalent  of  the  low-pass  frequency  domain  filter  and  is  known  as  die 
filler  function.  The  filter  function  is  also  related  to  die  spatial  frequency  distribution  of  the  light 
emitted  by  a  pixel  on  the  display.  Figure  6-3  shows  a  few  filter  functions  together  with  their 
frequency  spcctnmis.  The  first  filter  in  the  figure  is  the  impulse  function  which  correspond s  to  the 
images  produced  by  die  bit-map  algorithms  dial  exhibit  die  aliasing  problems.  The  wic  filter  has  the 
ideal  low-pass  frequency  response  and  results  in  samples  that  truly  model  the  low-frequency 
characteristics  of  the  sampled  signal.  The  triangular  \\ Iter,  w  hich  is  an  approximation  of  the  sine  filter 
is  attractive  for  its  mathematical  iractability  and  is  probably  die  most  commonly  used  function.  The 
rectangular  filter  is  probably  the  easiest  filter  to  implement  because  the  averaging  process  involves 
only  additions.  The  Gaussian  filter  is  the  closest  approximation  of  die  spatial  distribution  of  the  light 
emitted  by  a  pixel.  Apart  from  the  shape  the  other  important  property  of  the  filter  function  is  its 
extent.  All  filters  with  a  finite  extent  pass  some  of  the  frequencies  beyond  the  Nyquist  frequency.  In 
practice  both  the  shape  and  extent  arc  varied  until  an  aesthetically  satisfactory  image  results. 

In  order  to  sample  two-dimensional  images  the  filter  functions  have  to  be  extended  to  two 
dimensions.  There  arc  two  different  ways  in  which  this  can  be  done.  In  the  first  method  the  one* 
dimensional  filter  is  rotated  circularly  to  produce  the  tuo-duneusional  filter.  This  results  in  a 
circularly  symmetric  filter  which  has  some  computational  adxanlnges.  The  other  method  extends  the 
onc-dimcnsional  filter  into  two-dimensional  by  merely  multiplying  two  one-dimensional  filters.  In 
practice  cither  of  the  two  techniques  can  be  used,  depending  upon  the  coincidence  of  the  situation. 


FILTERING 


93 


Filter  Function  Frequency  Spectrum 


Impulse 


Sine 


A _ Trian9U,ar 

|  ”1  Rectangular 


Figure  6-3:  A  few  possible  filter  functions  and  their  frequency  response. 


6.1.  Computing  filtered  images 

The  most  popular  method  for  computing  filtered  images  is  to  numerically  integrate  the  image  over 
the  area  of  the  filter  to  compute  the  intensity  of  each  pixel  in  the  image.  The  step  size  for  the 
integration  is  determined  by  the  accuracy  desired  in  the  intensity  value.  The  flat  square  filter. function 
which  extends  to  die  adjoining  pixels  (Figure  6-4)  will  be  used  to  illustrate  die  problem.  Other 
functions  result  in  similar  conclusions. 


Figure  6-4:  Flat  square  filter  function. 


If  the  pixel  resulting  from  the  averaging  process  is  supposed  to  be  accurate  to  8  bits  (i.c.  1  part  in 
256),  then  die  image  computing  process  should  provide  at  least  256  cqui-spaccd  samples  (on  a  16x16 
grid)  which  can  then  be  averaged  to  compute  the  intensity  value.  'ITiis  implies  that  die  image  has  to 
be  computed  to  a  higher  resolution  and  then  averaged  to  compute  die  filtered  image.  The  resolution 
should  be  8  times  die  actual  resolution  in  order  to  compute  an  image  with  a  resolution  of  8  bits.  This 
is  die  reason  that  an  8-bit  gray-scale  image  is  considered  to  be  equivalent  to  one  that  has  8  times  the 
resolution.  Similarly  an  image  with  4  bits  of  gray-scale  would  be  equivalent  to  one  with  twice  die 
resolution.  Different  filter  functions  result  in  approximately  the  same  numbers  for  the  image 
resolution  desired.  Experiments  with  the  human  visual  perception  confirm  these  results  [Lelcr  80]. 

It  is  seldom  necessary  to  compute  a  high  resolution  image  because  geometric  information  leads 
directly  to  the  filtered  intensity  of  the  pixels,  'flic  following  subsection  illustrates  Uiis  technique  with 
an  example. 

6.1.1.  Filtering  straight  edges 


Throughout  the  rest  of  this  chapter  we  shall  assume  the  use  of  a  conical  filter  function:  the 
function  has  its  maximum  value  at  the  center  of  a  pixel  and  decreases  linearly  to  zero  at  a  distance  r 
from  the  pixel  center.  The  radius  of  the  filter  is  r,  measured  using  the  convention  that  a  unit  distance 
is  the  distance  between  two  adjacent  pixel  centers.  The  function  is  normalized  so  that  the  enclosed 
volume  is  1.  This  function  is  die  circular  extension  of  the  triangular  filter.  Figure  6-5  shows  the  filter 
function  with  radius  1.  This  filter  is  chosen  both  for  its  mathcmetical  properties  _nd  the  fact  that  it  is 
a  close  approximation  to  die  ideal  sine  function. 


FILTERING 


95 


When  a  straight  edge  passes  through  a  pixel,  the  pixel's  intensity  should  be  proportional  to  the 
volume  of  the  cone  intersected  by  die  edge  (see  Figure  6-6).  Because  of  die  circular  symmetry  of  the 
filter,  only  the  perpendicular  distance  p  from  the  pixel  center  to  the  edge  is  needed  to  determine  the 
volume  intersected.  Thus  we  may  write  /  =  Fe(p)  where  Fg  is  determined  solely  by  the  choice  of  the 
filter,  provided  it  is  circularly  symmetric4.  Note  that  Fg(p)  =  0  for  p  <  -  r,  and  F,(/>)  =  1  for  p>  r. 
For  -  r<p<r,  Fg(p)  can  be  computed  by  numerically  integrating  the  volume  of  die  cone  intersected 
by  an  edge  at  a  distance  p  from  die  pixel  center. 


Since  I  is  represented  as  a  g-bit  integer  which  can  take  on  only  2g  distinct  values,  the  function  Fg 
can  be  be  converted  into  a  simple  linear  table.  If  F  is  a  linear  function  then  this  table  would  contain 
only  2s  entries;  for  a  nonlinear  function  die  number  of  entries  is  determined  by  the  maximum  value 
of  the  derivative  of  the  function.  Table’ 6-1  illustrates  values  of  F„  for  the  conical  filter  with  r  =  1,  if  / 
has  4  bits  of  resolution.  Because  of  the  limited  resolution  in  the  range  of  the  table,  we  can  actually 
cut  off  the  table  at  a  domain  slightly  shorter  than  the  dicorctical  maximum  allowed  by  die  function. 

Table  look-up  makes  the  anti-aliasing  computations  more  efficient.  By  performing  die  usual  scan- 
conversion  computations  with  slightly  higher  precision,  we  then  use  the  sub-pixel  position  on  an  edge 
as  an  index  into  a  table  which  will  provide  the  shade  value  for  a  pixel.  We  have  discussed  one  such 
table  for  single  edges  intersecting  pixels.  The  next  subsection  shall  discuss  a  universal  table  that  can 


4_ 

'Hi is  discussion  will  assume  that  each  object  shall  be  intensified  to  the  maximum  intensity, 
can  be  shown  by  merely  scaling  the  intensities  used  in  this  discussion. 


Objects  with  lower  intensities 


96 


p 

Fe(p) 

P 

Fe(p) 

<-12/16 

0 

1/16 

8 

-11/16 

1 

2/16 

9 

-10/16 

1 

3/16 

10 

-9/16 

1 

4/16 

11 

-8/16 

2 

5/16 

12 

-7/16 

2 

6/16 

12 

-6/16 

3 

7/16 

13 

-5/16 

3 

8/16 

13 

-4/16 

4 

9/16 

14 

-3/16 

5 

10/16 

14 

-2/16 

6 

11/16 

14 

-1/16 

7 

>12/16 

15 

0/16 

8 

Tabic  6-1:  Pixel  intensities  from  the  distance  to  edges. 

be  used  for  all  polygonal  geometries.  It  is  benefical,  however,  to  compile  separate  tables  which  can  be 
used  more  efficiently  for  more  frequently  used  geometries.  One  such  example  is  line  drawing. 
Intensities  for  pixels  intersected  by  a  line  can  be  considered  as  the  difference  between  the  intensity 
contributions  of  the  two  edges  of  the  line  (Figure  6-7).  Hence  there  are  two  parameters  that  arc 
needed  to  determine  the  intensity:  the  thickness  t  of  the  line  and  the  perpendicular  distance  p  from 
the  pixel  center  to  the  line  center.  Thus  we  may  write  I  =  where  F{  again  is  determined  solely 

by  the  choice  of  die  filter  function.  Table  6-2  illustrates  values  of  F{  for  /  =  1  for  the  conical  filter 
function  of  radius  1. 


6.1 .2.  Universal  Table  for  polygons 


Ml  polygons  can  be  split  up  into  trapezoids  by  drawing  horizontal  lines  at  each  corner  (Figure  4-5). 
It  shall  be  assumed  that  polygons  shall  be  formed  by  combining  its  subset  of  trapezoids.  This  section 
shall  discuss  the  anti-aliasing  tables  needed  to  fonn  trapezoids.  To  do  so,  we  shall  first  examine  die 
eleven  possible  ways  in  which  a  trapezoid  can  intersect  a  pixel. 

1.  The  pixel  is  totally  covered  by  the  trapezoid  (Figure  6-8).  The  intensity  of  such  a  pixel 
shall  be  the  same  as  the  shade  of  the  polygon.  Hence  I  =  Max. 

2.  Only  one  side  edge  of  the  trapezoid  intersects  the  pixel  (Figure  6-9).  Only  the 
perpendicular  distance  from  the  center  of  the  pixel  to  the  edge  is  needed  to  determine  the 
intensity  of  the  pixel.  Hence  we  can  say  that  I  =  Fg(p). 

3.  Only  one  horizontal  edge  of  the  trapezoid  intersects  the  pixel  (Figure  6-10).  Once  again 
only  the  perpendicular  distance  is  needed,  but  in  this  case  the  perpendicular  distance  is 
die  same  as  the  vertical  distance.  If  y  is  the  position  of  the  center  of  the  pixel  and  >-l  is  the 
position  of  die  edge  dicn  the  perndicular  distance  is.y— yl.  1  lencc  /  =  Fe(yl—y). 


H  N)  N)  W  W 


Figure  6-10: 


4.  Both  the  side  edges  of  the  trapezoid  intersect  the  pixel  (Figure  6-11).  In  this  ease  die 
intensity  of  the  pixel  is  the  difference  between  the  intensity  conributions  of  the  two  edges. 
Hence  /  =  I'jipl)  -  Fipl). 


FILTERING 


5.  Both  the  horizontal  edges  of  the  trapezoid  intersect  the  pixel  (Figure  6-12). 


Figure  6-12: 

As  in  die  previous  case,  the  intensity  for  such  pixels  is  the  difference  between  the 
intensity  contributions  of  the  two  edges.  I  =  F£y\—y)  -  Fjiy2—y). 

6.  One  corner  of  die  trapezoid  intersects  die  pixel  (Figure  6-13). 


To  determine  the  intensity  of  the  pixel  in  diis  ease  we  need  die  distances  to  both  die  edges 
and  the  angle  subtended  between  them.  This  can  be  formulated  as  /  =  F(y-yl,p,9). 

7.  In  the  ease  when  the  trapezoid  is  a  triangle,  a  comer  of  the  triangle  intersects  die  pixel 
(Figure  6-14).  This  ease  can  be  considered  as  die  difference  of  two  corners.  Hence  /  = 
Ffy-yhpl,Ol)  -  Ft(y-ylp2,e2). 


8.  Two  corners  which  arc  connected  by  a  horizontal  edge  intersect  the  pixel  (Figure  6-15). 
This  situation  is  similar  to  die  previous  one.  I  =  F^y—y\,p\,0\)  -  Ffy—y\,p2,02). 

9.  Two  corners  which  arc  connected  by  a  side  edge  intersect  die  pixel  (Figure  6-16).  This 
can  also  be  formulated  as  the  difference  of  two  corners.  I  =  Ff(y-y\,p,0)  - 
Ffy-y2,p,0). 


10.  Three  corners  of  the  trapezoid  intersect  the  pixel  (Figure  6-17).  Hie  intensity  of  the  pixel 
can  be  determined  as  the  difference  between  three  corners.  /  =  Ffj-  y\,p],0\)  - 
Ffy-y\,p2,02)  -  Ffy-y2.p\,0l). 


Figure  6-14: 


— yi 

y2 

Figure  6-16: 

11.  The  trapezoid  is  completely  enclosed  in  the  nixe!  (Figure  6-18).  'this  situation  is  similar 
to  the  previous  one.  /  =  Ffj—y\,p\,0\)  —  I'fy— y\,p2,02)  -  F((y—y2,p\,0\). 

Because  Fg  is  a  subset  of  F  with  y  =  1.0,  compiling  a  table  for  /•’  is  sufficient  to  determine  the 
intensity  in  all  possible  situations.  Appendix  A  contains  this  table  for  the  conical  niter  with  r  =  1.  As 
seen  in  the  previous  tables,  the  distance  parameters  of  die  table  span  the  range  between  - 12/16  and 
+ 12/16  at  a  spacing  ofl/16.  'Ihc  angular  parameter  has  the  range  between  0  and  w/2  with  a  spacing 
of  w/8.  The  reason  for  the  angular  spacing  is  easily  seen  by  the  fact  that  we  arc  using  a  circular  filter 


FILTERING 


Figure  6-17: 


Figure  6*18: 

and  a  4  bit  grayscale  resolution.  Hie  total  angle  of  2m  subtended  at  the  center  of  the  pixel  is  divided 
by  the  total  number  of  gray-levels  to  give  the  angular  spacing  of  2^/16,  i.e.  w/8. 

The  reason  for  discussing  the  various  intersections  of  the  trapezoid  with  die  pixel  was  that  we  can 
save  effort  if  we  can  recognize  that  the  intersection  is  a  simple  case  as  opposed  to  assuming  that  all 
the  edges  intersect  die  pixel.  If  the  algorithm  used  docs  not* identify  the  simpler  cases,  dicn  we  can 


102 


always  assume  the  most  complicated  case,  and  compute  the  vertical  distances  j-1  and  y7  to  the 
horizontal  edges  and  the  perpendicular  distances  ;>1  and  p2  to  die  side  edges.  These  distances  arc  tlien 
thresholded  to  +12/16  or  —12/16,  and  the  thrcsholdcd  values  used  to  lookup  the  intensity  table. 
'Hie  angles  subtended  by  the  side  edges  91  and  62  remain  constant  throughtout  the  whole 
computation  and  can  be  used  to  set  up  pointers  to  the  appropriate  two-dimensional  tables  which  then 
require  only  die  y  and  p  indices. 

The  next  chapter  discusses  algorithms  used  to  fill  trapezoids  using  the  universal  table. 

6.2.  Combining  filtered  images 

The  previous  section  discussed  a  method  for  anti-aliasing  images  filtered  by  computing  die  images 
to  a  higher  resolution  and  dicn  averaging  several  neighboring  pixels  to  compute  the  filtered 
intensities.  We  also  presented  a  computationally  cheaper  method  which  filters  die  images  by  using 
geometrical  information  to  compute  die  intensities  for  individual  pixels  directly. 

When  the  only  information  retained  is  the  intensity  values  of  individual  pixels,  we  face  a  problem 
for  filtering  two  or  more  interacting  lines  or  edges  properly.  The  problem  is  faced  when  we  compute 
the  second  of  two  interacting  geometries  because  the  pixels  computed  may  already  have  an  intensity 
value  stored  in  diem  with  no  hint  about  the  geometry  of  die  edge  diat  passed  through  them.  One 
possible  situation  is  shown  in  Figure  6-19,  where  the  amount  of  the  overlapping  area  between  the  two 
edges  would  have  to  be  known  in  order  to  compute  die  correct  intensity  of  the  combined  geometry. 
There  is  absolutely  no  way  to  deduce  the  overlapping  area  if  die  only  information  retained  from  the 
edge  computed  first  is  the  intensity  value  for  the  pixel. 

Some  simplifying  assumptions  can  allow  us  to  approximate  this  combination  process.  One  such 
assumption  is  often  valid  when  die  objects  arc  part  of  a  three-dimensional  scene.  The  individual 
objects  are  then  computed  in  a  dccpcst-object- first  order.  Figure  6-20  shows  two  such  objects  which 
intersect  the  same  pixel.  The  first  object  is  deeper  and  hence  die  intensity  /I  was  computed  before  12. 
The  assumption  we  shall  make  is  diat  die  two  objects  arc  relatively  far  apart  so  diat  die  light  passed 
dirough  the  one  object  is  uniformly  dispersed  by  die  time  it  gets  to  die  second  object.  We  shall  also 
assume  that  /I  and  12  arc  fractional  values  and  1  is  the  maximum  intensity. 

For  a  dark  background,  the  total  light  reflected  has  an  12  contribution  from  the  second  object  and 
a  contribution  of(  1  -  /2)/l  from  the  first  object.  The  resultant  ir  tensity  is  hence 


FILTERING 


103 


Figure  6-19:  Interacting  geometries  within  a  pixel. 

7  =  72  +  (l-/2)/l 
=  71  +  12  -  R 12 

This  formulation  is  equivalent  to  one  which  merely  multiplies  the  black  areas  of  each  computation. 

7=1-  (1  —  /1)(1  —  72) 

=  71  +  72  -  71 72 

For  white  backgrounds,  the  deeper  object  transmits  an  intensity  (1-71)  which  is  diminished  to 
(1  —  71)(1  - 12)  by  the  second  object,  'rhe  resultant  intensity  for  white  backgrounds  is  hence 

7  =  (l-71)(l-72) 

=  1  -  71  -  72  +  71  72. 

The  71  +  72  -  71  72  formulation  performs  llic  correct  reasonable  approximation  under  the 
appropriate  situations.  When  both  71  and  12  arc  small  fractions,  indicating  that  only  small  portions  of 
the  pixel  arc  intersected  by  the  objects,  then  the  combination  function  will  result  in  approximately 
the  sum  of  the  two  intensities.  This  approximation  is  fairly  reasonable  because  the  probability  of  the 
two  small  portions  overlapping  each  other  is  very  small.  In  the  case  when  one  of  the  intensities  is 
much  larger  than  the  other  one,  then  the  combination  of  the  two  is  nearly  equal  to  the  larger  one, 
which  corresponds  to  the  most  likely  situation  of  the  larger  object  completely  overlapping  the  smaller 


104 


Figure  6-20:  Overlapping  objects  within  a  pixel. 


Several  simpler  approximations  are  often  used  to  combine  the  intensity  values  computed  by 
filtering  image  objects.  The  popular  ones  are: 

I  =  Max( n,I2)  for  images  with  a  dark  background, 

/  =  Min(/l,/2)  for  images  with  a  white  background, 

/  =  /I  + 12  assumes  that  the  objects  do  not  overlap 

It  should  be  noted  that  under  known  circumstances  one  of  the  easier  combination  functions  might 
be  used.  For  example  when  it  is  known  a  priori  that  the  objects  abut  each  other,  the  intensities  must 
simply  be  added  to  compute  the  resultant  intensity.  This  situation  occurs  when  polygons  arc  created 
by  combining  trapezoids. 

Another  solution  to  the  combination  problem  is  to  keep  a  few  bits  of  geometry  information  with 
each  intensity  value.  This  information  can  come  from  the  tables  used  to  look  up  the  intensities  for 
particular  geometries  and  can  be  used  to  correct  intensities  for  pixels  where  different  edges  interact. 
If  the  final  image  has  a  4-bit  resolution,  then  the  area  of  the  pixel  should  be  divided  into  16  equal 
parts,  and  16  extra  bits  per  pixel  would  keep  track  if  the  corresponding  part  of  the  pixel  is  covered  or 
not.  Since  we  arc  using  a  circular  filter,  the  pixel  would  be  divided  radially  (Figure  6-21).  'Hie 
combination  function  would  then  be 


FILTERING 


105 


Figure  6-21:  The  pixel  area  can  be  divided  radially  to  track  which  part  of  die 

pixel  is  already  covered. 

/  =  H  +  12  -  BitsSet(Gl  AND  G2)/16t 

where  G\  and  Gl  arc  die  16-bit  numbers  corresponding  to  die  geometry  information  in  die  old  and 
the  new  intensities.  The  function  BitsSel  counts  die  number  of  bits  set  in  its  argument.  This  approach 
effectively  takes  us  back  to  computing  the  image  to  a  higher  virtual  resolution  in  which  the 
combination  problem  docs  not  exist  to  start  with. 

Experiments  with  lines  and  polygons  conclude  diat  die  combination  approximations  introduced  in 
this  section  produce  sadsfactory  results  for  static  images.  It  is  easy  to  deal  with  known  geometrical 
situations  when  die  correct  solution  can  be  used.  An  example  is  the  ease  of  butting  line  segments  and 
trapezoids,  when  adding  dig  intensities  produces  die  correct  result  Unknown  geometrical  situations 
call  for  approximate  combination  functions.  'Hie  Max  combination  /unction  works  satisfactorily  fo^ 
intersecting  lines  and  edges  with  dark  backgrounds. 

The  next  chapter  shall  introduce  various  scan-convcrsion  algoridims  that  can  be  used  to  create 
some  of  die  popular  geometries  that  use  the  filtering  techniques  discussed  in  diis  chapter. 

6.3.  Summary 

This  chapter  has  presented  efficient  filtering  techniques  diat  can  be  used  to  smoothen  the  edges  of 
polygons  and  lines.  The  .technique  primarily  uses  table  lookup  as  a  means  of  parametrizing  die 
computations  with  respect  to  die  filter  function.  It  should  be  emphasized  again  that  the  filter  function 
should  be  parameter  of  the  computations.  Both  the  size  and  the  shape  of  the  filter  function  are 
usually  varied  until  a  satisfactory  image  results.  The  changing  of  these  parameters  reflects  as  a  change 


106 


in  the  tables  as  well  as  the  algorithms  and  the  precision  used  in  the  algorithms  using  these  tables.  This 
chapter  docs  not  address  the  problem  of  how  the  filter  function  affects  the  algorithmic  aspect  of  die 
computation. 


The  chapter  also  studies  the  problems  of  combining  the  filtered  intensities  of  different  objects.  The 
combination  is  done  in  die  intensity  domain,  which  only  represents  a  logarithmic  value  of  die  actual 
photon  energy.  The  intensity  values  are  translated  into  energy  in  a  table  lookup  step  in  die  display,  a 
step  usually  referred  to  as  gamma  correction.  The  combination  function  is  an  important  step  in 
obtaining  good  images,  and  is  usually  also  varied. 

Appendix  A 


This  appendix  tabulates  the  function  F(y,p,6).  Each  row  of  the  table  contains  die  values  for 
-12/16  <  p  <  12/16  at  intervals  for  1/16,  for  given  values  of y  and  9.  The  values  of >’  lie  in  the  range 
—  12/16  <  y  <  12/16  at  intervals  for  1/16.  9  is  in  the  range  ofO  <  9  <  n/2  at  intervals  of  7r/8;  the 


intensities  for  other  angles  can  easily  be  derived. 
y  =  -12/16,  =  0 

00 1  1  1 2  3  3  4  5  6  7  8  9  10  10  11  12  13  13  14  14  15  15  15 
y  =  —12/16,  0  =  w/ 4 

00  1  1  2  2  3  3  4  5  6  7  8  9  10  10  11  12  13  13  14  14  15  15  15 
y  =  - 12/16,  0  =  n/2 

011  122  34456  7  8910  11  111213131414151515 
y=  -11/16 .0  ■■=  w/8 

00 0  1  1  2  2  3  4  5  6  6  7  8  9 10  11 12 12  13  14  14  15  15  15 
y=  -11/16,0  =  3w/8 

0 1 1  1  2  2  3  3  4  5  6  7  8  8  9  10  1 1  12  12  13  14  14  15  15  15  . 
y  =  -10/16,  0  =  0 

0  0  0  0  1  1  2  3  4  4  5  6  7  8  9  10  11  11  12  13  13  14  14  15  15 
y  =  — 10/16,  0  =  w/4 

00  1  1  1  2  2  3  4  4  5  6  7  8  9  10  11  11  12  13  13  14  14  15  15 
y  =  — 10/16,  0  =  w/2 

01  1  1  223  34567  88910  11  12  12  13  1314  14  15  15 
y  =  -9/16.  0  =  w/8 

00001  1223456789910  1112  121313  14  14  14 
y=  -9/16,  0  =  3w/8 

00  1  1  1  2  2  3  4  4  5  6  7  8 9  10  10  11  12  12  13  13  14  14  14 
y  =  -8/16.  0  =0 

000001  1  2344  5  67  89  10  11  11  121213  1314  14 
y  =  —8/16,  0  =  w/4 

0001  1  12234556789  10  11  11  12  1213  13  14  14 
y=  —8/16.  0  =  w/2 

0  1  1  1  2  2  3  3  4  5  5  6  7  8  9  10  10  1 1  12  12  13  13  13  14  14 
y=  -7/16,  0  =  w/8 

00000  1  1  2234  56  78  89  1011  11  12  12  13  13  13 
y  =  -7/16 ,0  =  3w/8 

001  1  1  22  3  34  5  5  67  899  10  11  11  12  12  13  13  13 
y=  -6/16,  0  =0 

0000000  1  12  34  567  8  99  1011  11  12  1212  13 
y  -  -6/16.  0  =  w/4 

0000  1  1  1  2234  4  56  7  89  9  10  II  11  1212  1213 


y  =  -12/16, 0  =  w/8 

0  0  1  1  1  2  3  3  4  5  6  7  8  9  10  10 11  12  13  13  14  14  15  15  15 
>•=  -12/16.  0  =  3w/8 

0  1  1 1 2  2  3  3  4  5  6  7  8  9  10  10  11  12  13  13  14  14  15  15  15 
y  =  -11/16,0  =  0 

000  1  1  22345  667  891011  1213  131414  151515 
y=  -11/16,0  =  it/4 

0  0  1  1  1  2  2  3  4  5  6  6  7  8  9  10  11 12  12  13  14  14  15  15  15 
y=  -11/16,0  =  tt/2 

0  1  1  1  2  2  3  4  4  5  6  7  8  9  9  10  1 1  12  13  13  14  14  15  15  15 
y-  -10/16,0  =  w/8 

0001  1  1  2344  5  67  8  91011  11  12131314  141515 
y  =  - 10/16, 0  =  3w/8 

0  1  1  1 2  2  3  34  5  6  6 7  8  9  10  11  11  12  13  13  14  14  15  15 
y=  -9/16,0  =  0 

0  0000  1  22  34  567899  10  11  12121313  141414 
y=  -9/16,0  =  w/4 

000  1  1  2  2  334567  8  99  10  11  12  12  13  13  14  14  14 
y  =  -9/16,  0  =  w/2 

0  1  1  1  22  334566789  10  11  11  12  13  13  13  14  14  14 
y=  -8/16,0  =  w/8 

0  0000  1  1  23345678910  11  11  12  1213  131414 
y  =  -8/16.0  =  3w/8 

00  1  1  1  22  3445677  8  9  10  11  11  12  12  13  13  14  14 
y  =  -7/16,0  =  0 

000000  1  1234567  8891011  11  12  1213  13  13 
.v  =  -  7/16,  0  =  w/4 

000  1  1  1  22  334  567  889  10  11  11  12  12  13  13  13 
,v  =  -7/16.  0  =  w/2 

01  1  I  22334  55  6  788  910  11  11  12  1213  131313 
.»•  =  -6/16,0  =  w/8 

0  00000  1  122  34567899  10  11  11  1212  12  13 
y=  -6/16.0  =  3w/8 

00  111  2  2  234456  7  78  9  10  10  11  11  12  12  12  13 


FILTERING 


>  =  -6/16,  5  =  w/2 

0111  222344  567  7  899  10  11  11 12  12  1213  13 
y  =  ^-5/16, 5  =  w/8 

0000  0  001  1233456789  910  11  11  1112  12 
y  =  -5/16,  5  =  3w/8 

0  0  0  1  1  1  2  2  3  3  4  5  5  6  7  8  8  9  10  10  11 11  11 12  12 
y  =  -4/16 .8  =  0 

0000  00000  12345567899  10  10  11  11 11 
y=  -4/16.0  =  w/4 

00000  111223  345677899  10  1011  1111 
v  =  -4/16,5  =  w/2 

*0011  12233445667889910  1011  1111  11 
y  =  -3/16,  5  =  w/8 

000000001  1  12345667889910  1010 

>  =  -3/16,5  =  3w/8 

00011  1  122334456678899  101010  10 

>  =  -2/16.5  =0 

00000000000123455678899910 
y=  -2/16.5  =  w/4 

000000111 12234456678899910 

>  =  -2/16.5  =  w/2 

00111122334456677883999910 

>  =  -1/16,5  =  w/8 
0000000000112234556778889 

>  =  -1/16,5  =  3w/8 
0000111122233455667778889 

>  =  0/16.  5  =  0 

0000000000000123445667778 
v  =  0/16, 5  =  ir/4 

*00000000  1  1  122334455667778 
y  —  0/16,  5  =  w/2 

0001111223344455667777888 

>  =  1/16,  5  =  w/8 

0000000000001122334556667 

>  =  1/16,  5  =  3w/8 

0000001111223334455566677 

>  =  2/16.  5  =  0 

0000000000000001233445566 
y  =  2/16.  5  =  w/4 

0000000000111222334455566 

>  =  2/16,  5  =  w/2 

00001  1  1  1  22233344455556666 

>  =  3/16,5  =  w/8 

0000000000000011122334455 

>  =  3/16,  5  =  3w/8 

0000000111112223334444555 
=  4/16,  5  =  0 

0000000000000000011233444 
y  =  4/16,  5  =  w/4 

0000000000001111222333444 

>  =  4/16,  5  =  w/2 

0000011111222233334444444 
y  =  5/16.  5  =  w/8 

0000000000000000011222333 
>•=5/16.5  =  3w/8 

000000000  1  1  1  1  1222223  3  3  33  3 

>  =  6/16.5  =0 

00000000000000000001 12223 

>  =  6/16, 5  =  w/4 


>■=  -5/16,5  =  0 

00000000123  34  567  899  10  11 11  11 12 12 

>  =  -5/16,  5  =  w/4 

000011  1  223  345667  899101111  11  1212 

>  =  -5/16,  5  =  w/2 

001  112  2  3  34  5  5  6  7  8  8  9  9  10  11  11  11 12 12  12 

>  =  —4/16.  0  =  w/8 

000000001  1  23455  67  899  101011  11  11 
y=  -4/16,  5  =  3w/8 

00011122234456678899  101011  1111 
>=  -3/16,5  =  0 

00  000000001  2  345  667  8  8  99101010 
y-  -3/16,  5  =  w/4 

00  00001  112234456778899101010 

>  =  —3/16,  5  =  w/2 

001  1  1222  34  4  55  67  7  8  8991010101011 
>•=  -2/16,  5  =  w/8 

00000000011223455678899910 
>=  -2/16,5  =  3w/8 
00001112223345566778899910 

>  =  -1/16,5  =0 

0000000000001234556778889 

>  =  -1/16,  5  =  w/4 
0000000111223344566778889 
y  =  — 1/16,  0  =  w/2 

001  111  2.223344566677888899 

>  =  0/16.  0  =  w/8 

0000000000011223445667778 
y  =  0/16,  0  =  3 w/8 

0000011112233344556677788 
>=1/16,5  =0 

0000000000000012334556667 

>  =  1/16,  0  =  w/4 

0000000001112223344556667 

>  =  1/16,  5  =  w/2 

0001111222334445556666777 
y  -  2/16, 0  =  w/8 

0000000000000111233445566 

>  =  2/16,  5  =  3w/8 

0000000111122233444555666 
y  =  3/16,  5=0 

0000000000000000122344455 

>  =  3/16,  5  =  w/4 

0000000000011112233344455 
y  =  3/16,  0  =  w/2 

0000111112223334444555555 

>  =  4/16.  5  =  w/8 

0000000000000001112233444 
y  =  4/16,  5  =  3w/8 

0000000011111222233344444 

>  =  5/16, 5=0 

0000000000000000001122333 

>  =  5/16,  5  =  w/4 

0000000000000111122223333 
y  =  5/16.  5  =  w/2 

0000001111122222333333344 
v  =  6/16.  5  =  w/8 

00 0000 0000000000001 112223 

>  =  6/16,  5  =  3w/8 


108 


0000000000000011111222223 
y  —  6/16,  9  =  m/1 

0000000111111222222333333 
y  =  7/16, 9  =  m/% 

0000000000000000000111122 
y  =  7/16, 9  =  3ir/8 

0000000000001111112222222 
v  =  8/16,  9=0 

0000000000000000000000111 
y  =  8/16,  9  =  m/4 

0000000000000000011111112 
y  =  8/16,  9  -  ml 2 

0000000001111111112222222 
y  =  9/16, 0  =  m/% 

0000000000000000000000111 
y  =  9/16.  9  =  3w/8 

0000 000 0  0 000 0001  1  111 lllll 
v  =  10/16,  9=0 

0000000000000000000000001 
.v  =  10/16,  0  =  m/4 

OOOOOOOOOOOOOOOOOOOOOOlll 
v  =10/16 ,0  =  m/1 

00000000000001 11  11  11  lllll 
y=  11/16,  9  =  m/% 

OOOOOOOOOOOOOOOO 000000000 
y  =  11/16,  9  =  3ir/8 
OOOOOOOOOOOOOOOOOOOOOOlll 
v  =  12/16,  9=0 

0000000000000000000000000 
y  =  12/16.  0  =  m/4 

0000000000000000000000000 
y  =  12/16,  0  =  m/1 

0000000000000000000000000 


0000000000011111222222333 
y=  7/16,  9=0 

0000000000000000000011122 
y  =  7/16,  0  =  m/4 

0000000000000000111112222 
y  =  7/16.  0  =  m/1 

0000000011111112222222222 
y  =  8/16.  0  =  m/% 

0000000000000000000001111 
y=  8/16,  0  =  3w/8 

0000000000000111111111222 
y=  9/16,0  =0 

0000000000000000000000011 
v  =  9/16,  9  =  m/4 

00000000000000000001  1  1  111 
.v  -  9/16,  9  =  m/1 

booooooooooiiiini  uniii 

y  =  10/16,  9  =  m/% 

0000000000000000000000001 
v  =  10/16,  9  =  3w/8 

00000000000000 00001  1  1  1  1  1  1 
y  =  11/16.  9=0 

0000000000000000000000000 
y=  1 1/16,  9  =  m/4 

ooooooooooooooooooooooooo 

y  =  11/16.  9  =  m/1 

0000000000000000001111111 
y  =  12/16,  9  =  m/% 

OOOOOOOOOOOOOOOOOOOOOOOOO 
V  =  12/16.  9  =  3ir/8 

OOOOOOOOOOOOOOOOOOOOOOOOO 


FILURI-D  SCAN-CONVERSION 


109 


Chapter? 

Filtered  Scan-Conversion 

This  chapter  shows  that  algorithms  for  scan-conversion  with  filtering  arc  similar  to  the  algorithms 
for  bit-map  scan-conversion  discussed  before  with  the  only  difference  being  the  need  for  extra 
precision  in  distance  parameters  to  allow  the  computation  of  grayscale  intensities.  The  amount  of 
extra  precision  required  is  approximately  die  same  as  the  number  of  bits  of  precision  in  die  pixel 
intensity  values.  In  this  chapter  we  shall  restrict  our  discussion  to  unit-thickness  lines  and  trapezoids, 
and  the  use  of  the  conical  filter  with  unit  radius. 

7.1.  Filtered  line  drawing 

The  first  subsection  shall  present  a  simple  extension  to  the  Ikesenham  algorithm  that  can  be  used 
to  draw  gray-scale  lines.  The  second  subsection  shall  discuss  algoridims  that  allow  die  parallel  update 
of  several  pixels  at  a  time. 

7.1.1 .  Incremental  algorithms  for  filtering  edges 

In  Chapter  6  we  presented  a  table  that  can  be  used  to  determine  die  intensities  for  the  pixels  that 
intersect  die  edges  of  a  line.  Only  die  perpendicular  distance  from  the  pixel  to  the  center  of  die  line 
is  required  to  determine  die  intensity  for  dial  pixel.  Hence  the  algoridims  to  draw  a  line  must 
compute  the  perpendicular  distance  p  between  each  pixel  center  and  die  line.  To  reduce 
computation,  this  calculation  is  performed  incrementally,  as  in  Brcscnham’s  algorithm  [Brcscnham 
65].  This  discussion  shall  be  restricted  to  unit  diickncss  lines  drawn  from  (0,0)  to  {dx,dy)  in  die  first 
octant  (i.c.,  0  <  dy  <  dx)\  extensions  to  other  lines  arc  obvious.  Such  lines  intensify  two  or  three  pixels 
in  each  column  of  pixels  (sec  Figure  7-1)5.  'Ihc  algorithm  will  keep  track  of  the  location  of  die  center 
pixel  and  the  perpendicular  distance  to  the  iinc's  center  from  the  pixel  center. 


5  lhc  algorithms  and  tables  presented  in  this  chapter  arc  designed  to  produce  images  using  4-bit  intensity  values.  A  diagonal 
line  at  almost  45  degrees  will  actually  intersect  five  rather  than  three  pixels,  but  the  top  and  bottom  pixels  arc  intensified  at  less 
than  0.2%  of  the  maximum  I  hc  algorithm  presented  ignores  these  pixels  because  a  4-bit  intensity  value  will  record  zero  for 
such  an  intensity.  If  a  wider  range  of  intensities  is  available,  then  algorithm  may  be  modified  in  an  obvious  way  to  illuminate 
more  pixels  in  each  column;  the  tables  must  also  provide  more  precision. 


110 


t  =  1 
\ 


Figure  7*1:  Three  pixels  arc  shaded  in  each  column. 

We  shall  initially  assume  that  line  endpoints  lie  at  pixel  centers,  and  hence  dx  and  dy  are  integers. 

From  our  First-octant  assumption,  the  slope  m  =  dy/dx  has  a  value  between  0  and  1.  In  die  algorithm 

below,  x  and  y  track  die  central  pixel  in  each  column  through  which  the  line  passes,  and  v  is  die 

vertical  distance  from  that  pixel  to  die  line.  This  vertical  distance  v  is  a  signed  value;  a  positive  value 

indicates  that  the  center  of  the  line  is  above  the  center  of  the  pixel6,  and  a  negative  value  indicates  the 

opposite.  The  variable  s  is  a  direshold  distance  used  to  decide  whether  the  central  pixel  in  die  next 

column  lies  diagonally  or  horizontally  across  from  the  central  pixel  in  the  current  column. 

VAR  x.y,  :  INTEGER;  v,m,s  :  REAL; 
m  :=  dy/dx; 
v  :=  0;  s  :=  0.5-m; 
y  :=  0; 

FOR  x  ;=  0  to  dx  00 
BEGIN 

Shade  pixels  at  (x,y-l),  (x.y)  (x,y+l); 

IF  (v  >  s)  THEN 
BEGIN 

y  :=■  y+1; 
v  :=  v+m-1; 

EN0 

ELSE 

v  ; =  v+m; 

END; 

The  pixel  at  (x,y)  is  located  at  a  vertical  distance  v  from  the  line,  and  die  pixels  at  (.xj'—  1)  and 
(jr,j'+ 1)  arc  at  distances  v- 1  and  v+ 1  respectively. 


6y  increases  upwards 


FILTERED  SCAN-CONVERSION  111 


In  order  to  determine  die  shade  of  die  pixels,  we  need  to  compute  the  perpendicular  distance  p 

from  die  pixel  to  the  line.  The  vertical  distances  arc  related  to  die  perpendicular  distances  by  a  factor 

of  c  =  d.x/$qn{d.x2  +  d}?),  such  that  p  -  cv.  The  following  algorithm  shows  the  modifications  to 

compute  the  perpendicular  distance. 

VAR  x,y,  :  INTEGER;  p.m.c.s  :  REAL; 
m  :=  dy/dx; 
c  :=  l/sqrt(m*m+l) ; 
p  :=  0;  s  :=  (0.5-m)*c; 
y  0; 

FOR  x  :=  0  to  dx  DO 
BEGIN 

Shade  pixels  at  (x,y-l),  (x.y)  (x,y+l); 

IF  (p  >  s)  THEN 
BEGIN 

y  :=  y+1; 
p  :=  p+(m-l) *c ; 

END 

ELSE 

p  :=  p+m*c; 

END; 

The  two  expressions  (m-l)c  and  me  can  be  precomputed  and  do  not  have  to  be  computed 
repeatedly  in  the  inner  loop.  To  compute  die  pixel  shades,  die  absolute  values  of  p,  p—c,  and  p+c 
are  used  as  indices  into  Table  6-2  to  determine  intensities  at  (xj>),  (x.y—  1),  and  (■xj'-F  1)  respectively. 
The  table  look-up  is  performed  by  thresholding  the  absolute  distances  to  20/16  and  using  the  new 
distance  value  to  index  into  the  table. 


The  incremental  algorithm  presented  above  docs  not  compute  die  sampled  ’  allies  of  pixels  on  or 
near  die  endpoints  of  die  line.  The  situation  is  illustrated  in  Figure  7-2,  which  shows  an  endpoint  of  a 
line.  The  pixels  shown  with  their  surrounding  filters  arc  not  intensified  correctly  by  our  algorithm;  in 
fact  some  of  them  arc  not  intensified  .at  all.  The  intensities  of  each  of  these  pixels  is  obtained  by 
integrating  the  filter  function  for  each  pixel  with  the  exact  shape  of  die  line.  There  arc  several 
methods  available  to  perform  such  computations. 


One  approach  to  computing  the  endpoint  intensities  is  by  using  exact  geometric  operations 
[Feibush  80]  to  split  the  endpoint  into  geometries  for  which  die  sampled  intensity  is  precomputed. 
An  approximation  can  be  obtained  by  sampling  the  endpoint  at  points  much  more  closely  spaced 
than  pixel  centers.  Both  these  approaches  arc  computationally  expensive. 


An  easier  approach  is  to  precompute  the  endpoint  intensities  and  store  diem  in  a  table,  much  as  we 
do  for  line  intensities.  To  display  die  endpoint  of  a  line,  the  intensities  of  the  six  pixels  in  the  vicinity 
arc  determined  from  the  table.  For  4-bit  gray-scale  accuracy,  it  is  sufficient  to  compute  a  set  of 


endpoints  for  lines  with  slopes  between  0  and  1  at  intervals  of  1/16  in  slope  (see  Table  7-1).  The 
entries  in  the  table  arc  in  the  same  configuration  as  the  six  pixels  shown  in  Figure  7-2.  Using  a 
combination  of  mirroring  and  transposition  transformations,  these  endpoint  intensities  can  be  used 
for  lines  in  every  octant. 


Figure  7-2:  Knd  of  a  line,  showing  the  six  pixels  that  may  be  illuminated. 

The  table  look-up  technique  for  endpoint  intensities  allows  us  to  alter  the  anti-aliasing  filter 
function  and  the  endpoint  shape  by  simply  replacing  the  table.  As  discussed  before,  the  filter 
function  is  varied  until  the  resulting  image  is  aesthetically  satisfactory.  'The  endpoints  of  lines  can  also 
have  different  shapes;  the  flat  and  circular  ones  arc  normally  used. 

The  algorithm  can  be  adapted  to  display  lines  and  edges  drawn  between  endpoints  that  arc  not 
integers;  that  is,  that  do  not  lie  on  the  pixel  grid.  Such  a  scheme  is  necessary  to  avoid  additional 
aliasing,  for  example  the  jitter  in  a  moving  image  caused  by  quantizing  endpoints  to  lie  on  pixel 
centers.  To  accommodate  non-inugcr  endpoints,  two  modifications  must  be  made.  First,  the 
initialization  of  the  algorithm  must  be  changed  to  compute  the  coordinate  of  the  first  pixel  to  be 
illuminated  and  to  compute  the  initial  value  for  p  at  this  point.  The  following  algorithm  shows  these 


modifications.  'Hie  non-integral  endpoints  arc  (*,,.»  .)  and  (jr 2,y2). 


FILTERED  SCAN-CONVERSION 


Slope  = 

=  0/16 

Slope  = 

1/16 

Slope  =  2/16 

0.000 

0.056 

0.000 

0.063 

0.000  0.072 

0.000 

0.393 

0.000 

0.390 

0.000  0.390 

0.000 

0.056 

0.000 

0.048 

0.000  0.042 

Slope  = 

=  3/16 

Slope  = 

4/16 

Slope  =  5/16 

0.000 

0.082 

0.000 

0.094 

0.000  0.107 

0.000 

0.390 

0.000 

0.390 

0.000  0.390 

0.000 

0.037 

0.000 

0.032 

0.000  0.028 

Slope = 

=6/16 

Slope  = 

=  7/16 

Slope =8/16 

0.000 

0.121 

0.000 

0.136 

0.000  0.152 

0.001 

0.390 

0.001 

0.390 

0.002  0.391 

0.000 

0.025 

0.000 

0.022 

0.000  0.019 

Slope: 

=  9/16 

Slope  = 

=  10/16 

Slopc  =  11/16 

0.000 

0.169 

0.000 

0.187 

0.000  0.206 

0.002 

0.390 

0.003 

0.390 

0.003  0.390 

0.000 

0.017 

0.000 

0.015 

0.000  0.013 

Slope 

=  12/16 

Slope = 

=  13/16 

Slope  =  14/16 

0.000 

0.225 

0.000 

0.245 

0.000  0.264 

0.004 

0.390 

0.005 

0T90 

0.006  0.390 

0.000 

0.012 

0.000 

0.010 

0.000  0.009 

Slope 

=  15/16 

Slope  = 

=16/16 

0.001 

0.284 

0.001 

0.304 

0.007 

0.390 

0.007 

0.391 

0.000 

0.008 

0.000 

0.007 

Table  7-1: 

Endpoints  for  lines  with  different  slopes. 

VAR  x.y  :  INTEGER;  p.m.c.s  :  REAL; 
m  :=  (y2-yl)/(x2-xl); 
c  :=  l/sqrt(m*m+l) ; 
yl  :=  yl-Hn*(round(xl)-xl) ; 
y  :=  round(yl); 

P  :=  (yl-y)*c  ; 
s  ;  =  (0.5 -m) *c ; 

FOR  x  :=  round(xl)  TO  round(x2)  DO 
BEGIN 

Shade  pixels  at  (x,y-l) ,(x,y) , (x,y+l); 
IF  (p  >  s)  THEN 
BEGIN 
y  :»  y+l; 
p  :=  p+(m-l)*c; 

END 

ELSE 

p  :■  p+m*c; 


114 


The.second  change  is  that  samples  near  endpoints  need  to  take  account  of  die  exact  location  of  die 
line  endpoint.  We  cither  have  to  spend  a  lot  of  processing  to  fdter  these  pixels  or  use  a  much  larger 
table  to  look  up  the  fdtered  values.  If  the  table  contains  endpoints  at  intervals  of  1/16  for  each 
coordinate  and  an  interval  of  1/16  for  die  slope,  dicn  the  size  of  the  table  at  six  pixels  per  endpoint 
would  be  2947S  entries!  Chapter  8  discusses  an  algorithm  which  can  be  used  move  images  by 
subpixel  distances.  This  would  reduce  die  storage  requirements  of  the  endpoint  tables. 

7.1.2.  Drawing  lines  using  strokes 

As  discussed  in  the  Chapter  5,  die  ability  to  update  several  pixels  at  a  time  can  be  used  to  increase 
the  speed  of  line  drawing  algorithms.  The  line  drawing  algorithms  can  now  compose  lines  using 
shorter  segments  which  are  joined  together  to  form  the  line.  This  technique  can  be  used  effectively  in 
display  architectures  which  allow  parallel  update,  die  primary  subject  of  diis  thesis.  As  we  have  seen 
before,  line  drawing  requires  the  ability  to  update  a  square  area  in  order  to  achieve  a  symmetric 
speedup  for  both  shallow  and  steep  lines. 

We  shall  present  two  different  strategies  which  draw  lines  using  parallel  updates.  The  first  scheme 
computes  each  stroke  by  using  a  parallel  set  of  processors,  while  the  second  one  uses  a  precomputed 
set  of  strokes  and  the  line  drawing  algorithm  merely  chooses  which  of  the  set  of  strokes  to  use  and 
where  to  place  them.  Both  techniques  arc  described  for  lines  w'ith  integer  endpoints  in  the  first  octant 
originating  at  (0,0)  and  drawn  to  ( dx,dy ).  They  can  be  extended  to  non-integer  endpoints  in  die  same 
manner  as  the  incremental  algoridim. 

7.1.2. 1.  Computed  strokes 

A  proccssor-pcr-pixcl  has  always  been  the  dream  of  all  people  who  design  raster  scan  displays.  If 
each  point  on  the  display  had  an  independent  processor,  dicn  the  line  drawing  algoridim  can  use  diis 
immense  parallelism  available  to  draw  each  line  in  one  step.  Mach  processor  would  merely  have  to 
determine  whether  the  pixel  it  is  responsible  for  lies  on  or  outside  the  line,  and  intensify  the  pixel 
appropriately.  Although  this  technique  updates  the  line  in  a  time  which  is  independent  of  the  length 
of  the  line,  it  is  wasteful  of  the  processing  power  available  because  most  of  the  processors  represent 
pixels  nowhere  near  the  line.  A  tradeoff  would  have  a  set  of  processors  which  can  update  a  part  of  the 
display  and  use  them  iteratively  to  compute  the  whole  line.  Assigning  these  processors  to  pixels  is 
equivalent  to  the  memory  organization  problem.  If  they  can  be  used  to  update  several  points  along  a 
scan  in  parallel,  dicn  horizontal  lines  can  be  drawn  faster  than  vertical  lines  and  vice  versa.  For  diis 
reason  these  processors  should  update  a  square  region  of  the  display. 


FILTliRIiD  SCAN-CONVERSION 


115 


If  the  set  of  processors  can  be  used  to  update  a  NxN  square  region  of  die  display,  dien  lines  are 
drawn  by  updating  A'-pixel  strokes  at  a  time.  The  line  drawing  algorithm  would  position  the 
processors  at  a  certain  point  on  the  display,  compute  die  intensities  of  all  pixels  in  die  square  and 
then  update  die  memory.  The  processors  would  dien  be  positioned  to  the  next  point  along  die.  line 
and  the  steps  iterated  until  die  whole  line  is  drawn. 

Once  again  we  shall  restrict  the  discussion  to  lines  drawn  from  (0,0)  to  (dx,dy)  in  die  first  octant. 
The  algoridim  described  can  be  modified  easily  for  other  lines  in  other  octants.  As  seen  before  a  gray¬ 
scale  line  in  die  first  octant  with  unit  diickncss  illuminates  two  or  three  pixels  in  each  column.  Hence 
a  diagonal  line  illuminates  pixels  in  /V+l  rows  for  a  span  of  A' columns  (see  Figure  7-3).  Since  die 
display  only  allows  the  update  of  A'xA  squares,  our  algorithm  can  step  by  only  N- 1  column  for  each 
update.  Conversely,  lines  in  the  second  octant  will  be  drawn  by  stepping  up  A—  1  rows. 


Figure  7-3:  A+ 1  rows  of  pixels  are  illuminated  in  a  span  of  ;V  columns. 

Our  line  drawing  algorithm  makes  two  assumptions.  First,  we  assume  dial  all  pixels  turned  on  are 
within  die  rectangle  enclosed  by  the  two  end  points  of  the  line.  Second,  the  intensity  of  all  die  pixels 
in  this  rectangle  can  be  computed  as  if  die  line  extended  infinitely  in  both  directions.  'I'll is 
assumption  produces  incorrect  endpoint  intensities  and  can  be  fixed  by  placing  endpoint  strokes 
(Table  7-1)  at  each  endpoint  of  die  line. 

The  first  stroke  will  be  located  at  (0,-1)  and  subsequent  ones  will  be  located  at  ((A-l)r, 
[/(A- \)dy/dx\  - 1)  for  0  <  /  <  [dx/(N-  I)J. 

Within  each  stroke,  each  pixel  needs  to  compute  die  perpendicular  distance  to  the  line  in  order  to 
determine  the  intensity  of  that  pixel.  Figure  7-4  shows  this  situation  and  demonstrates  that  if  we  have 
three  precomputed  values  of 


116 

sinO  =  dy/  V  dx1  +  dy1 , 
cos#  =  d.x/V dxL  +  d/' , 

and  derrr  which  is  die  perpendicular  distance  from  the  stroke  origin  (x,^)  to  the  line,  then  each  pixel 
(x,y)  in  the  stroke  can  compute  its  perpendicular  distance  as 

d  =  (y—y/)cosO  -  (x—x/)sin0  +  derr. 


Figure  7-4:  Computing  die  perpendicular  distance  to  the  line  in  parallel. 

The  three  precomputed  values  can  be  broadcast  to  the  NxN  processor  array  before  die  distance 
computation.  Notice  diat  sinO  and  cosQ  remain  constant  for  the  whole  line  while  derr.  has  to  be 
updated  for  each  stroke  and  can  be  computed  as 

derr.  =  y.cosO  —  xsinO. 


The  following  algorithm  combines  diese  steps  and  uses  die  precomputed  values  of  sind,  cosQ,  s 
(  =  [(/V-  \)dy/dx\),  and  a  (  =  2(/V—  1  )dx  -  2 sdx). 


FILTERED  SCAN-CONVERSION 


117 


Vi  :=  -1: 
derr.  :=  -cos 9\ 
r  :=  a  -  dx; 

for  xi  :=  0  to  dx  by  ( N - 1 )  do  begin 

{  The  following  nested  loop  is  executed  in  parallel  by  the 
NxN  processors  } 
for  x  : =  0  to  (N-2)  do 

for  y  :=  0  to  (N-l)  do  begin 
dxy  :=  y*cosfl  -  x*sin0  +  derr^ 

Intensity  :=  LineTable[d  ]; 
end ; 

if  r  >  0  then  begin 
yi  :=  yi  +  s  +  1; 
r  : =  r  +  a  -  2*dx ; 

derri  :=  derr.  +  (s+l)*cos 0  -  (M-l)*sin0; 
end ; 

else  begin 

yi  :=  -Vi +  s= 

r  :  =  r  +  a; 

derr^  :=  derri  +  s*cos#  -  (N-l)*sin0; 
end ; 

end; 


The  integer  prologue  used  in  the  bit-map  stroke  algorithms  can  be  used  to  precompute  the  values 
of  a  and  s,  while  table  look  up  can  determine  values  for  sinO  and  cos 0.  Notice  that  by  using 
incremental  techniques  all  distance  computations  have  been  reduced  to  multiplications  by  numbers 
in  die  range  between  0  and  (A'- 1),  which  can  be  performed  in  hardware  using  an  adder  with  (log  N) 
inputs. 


7.1. 2.2.  Precomputed  Strokes 

Drawing  bit-map  lines  using  precomputed  strokes  was  the  subject  of  Chapter  5.  It  discusses  several 
algorithms  and  their  tradeoffs  with  respect  to  their  speed  and  the  total  number  of  strokes  required  to 
produce  an  arbitrary  line.  All  algorithms  assume  tire  ability  to  place  the  origin  of  any  stroke  at  an 
arbitrary  pixel  of  the  display.  This  implies  that  all  strokes  can  be  defined  such  that  tlv  first  pixel 
displayed  is  located  at  die  origin  (0,0)  and  die  translation  of  every  point  of  the  stroke  can  be  used  to 
position  die  stroke  at  any  pixel  boundary.  Wc  shall  once  again  make  diis  assumption  and  also  restrict 
this  discussion  to  lines  in  the  first  octant,  l.incs  in  the  other  octants  can  be  drawn  either  by  using  a 
combination  of  mirroring  and  transposing  of  strokes,  or  by  using  a  larger  set  of  strokes. 


In  die  discussion  on  bit-map  lines,  wc  concluded  diat  die  positioning  of  pixels  on  lines  drawn  using 
strokes  has  two  sources  of  error.  The  first  is  the  error  caused  in  the  placement  of  the  stroke  and  die 


second  is  error  within  the  stroke.  The  most  simple-minded  bit-map  algorithms  cause  an  error  of  1/2 
in  the  stroke  placement,  and  another  error  of  1/2  within  a  stroke  which  could  be  in  the  same 
direction  as  the  stroke  placement  error,  resulting  in  a  total  possible  error  of  approximately  one.  The 
appearance  of  the  lines  is  improved  by  using  a  larger  number  of  strokes  with  the  added  parameter  of 
sub-pixel  placement.  The  line  drawing  algorithm  then  computes  the  stroke  position  to  a  higher 
precision  and  uses  the  sub-pixel  offset  to  choose  from  a  larger  set  of  strokes.  The  number  of  strokes 
increases  as  die  square  of  the  sub-pixel  resolution. 

The  situation  is  not  very  different  in  the  case  of  gray-scale  lines.  Since  gray-scale  intensities  are 
computed  using  sub-pixel  positioning  of  die  line,  an  algorithm  which  computes  the  stroke  positions 
to  the  same  sub-pixel  resolution  and  also  uses  the  high-precision  value  of  die  slope  available  to 
choose  from  a  larger  set  of  strokes  could  be  used  to  produce  grayscale  lines.  In  our  filtering  example 
of  four  bits  of  gray-scale  and  a  conical  filter  function,  distances  have  to  be  computed  to  l/16th  of  a 
pixel,  which  can  be  done  by  adding  four  extra  bits  of  precision  to  coordinates.  In  practice  fewer  extra 
bits  of  precision  produce  satisfactory  results. 

The  algoridim  used  for  bit-map  lines  can  also  be  used  here  for  gray-scale  lines.  Strokes  for  the  first 
octant  will  originate  between  0.5  and  1.5  spaced  by  the  sub-pixel  resolution,  and  end  between  0.5  and 
IV- 0.5  also  spaced  by  die  same  sub-pixel  resolution.  Let  /  be  die  number  of  extra  bits  of  precision 
provided  to  compute  sub-pixel  positioning,  and  j  =  2'.  The  total  number  of  strokes  required  is  hence 
f(N—  1).  We  can  choose  to  index  the  strokes  as  (m,n)  where  in  defines  the  origin  of  die  stroke  and 
varies  such  diat  0  <  m  <  j ,  and  n  is  the  height  of  the  stroke  such  diat  0  <  n  <j{N—  1). 

Hie  following  algoridim  can  be  used  '.o  draw  lines  in  the  first  octant.  It  assumes  diat  the  line  is 
drawn  from  (0,0)  to  (t lx,dy)  and  a  prologue  is  used  to  compute  s  (~\j(N-\)dy/dx\),  and  a 
(=2j(N—\)dx  —  Ijsdx). 

y  :=  0; 

r  :=  a  -  j*dx; 

for  x  : =  0  to  dx  by  (N-l)  do  begin 
if  (r  >  0)  then  begin 

Display  Stroke  (y%j,s  +  l)  at  (x , round(y/j)-l) ; 
y  :=  y  +  s  +  1; 

r  :=  r  +  a  -  2*j*dx; 

end 

else  begin 

Display  Stroke  (y"/j,s)  at  (x ,  round(y/j )- 1) ; 
y  :=  y  +  s; 

r  :  =  r  +  a; 

end 


end 


!  I LTHR  HD  SCA  N-CON" V  F.RSION 


119 


In  tliis  algorithm  botli  y  and  s  arc  represented  with  an  integer  part  and  an  /-hit  fractional  part. 
Because  of  the  stroke  structure  we  have  d',f>ned,  the  strokes  arc  placed  one  pixel  below  the  integer 
rounded  value  of;1  (i.c.  at  round(j//)  —  1).  '1  he  stroke  would  hence  be  placed  at  (.v,round(i7/)—  1 )  and 
line  within  the  stroke  would  originate  at  y/j-  round(_)7/)  + 1 .  The  algorithm  can  be  modified  to  add  a 
constant  to  the  initial  value  of  y  and  then  use  die  integer  part  of  y  as  die  origin  of  the  stroke. 

y  :=  j/2; 

r  :  =  a  -  j*dx; 

for  x  :  =  0  to  dx  by  (fJ-1)  do  begin 
if  (r  >  0)  then  begin 

Display  Stroke  ( y % j , s  + 1 )  at  (x,y/j-l); 
y  :=  y  +  s  f  1; 
r  :=  r  +  a  -  2*j*dx; 
end 

else  begin 

Display  Stroke  (y%j,s)  at  (x,y/j-l); 
y  :=  y  +  s; 
r  :  =  r  +  a ; 
end 

end 


'fhe  value  of  /  is  an  aesthetic  decision.  For  a  4-bit  grayscale  display  /  -  2  seems  to  suffice  and 
hence  a  total  of  112  strokes  is  sufficient  for  such  display  devices. 


7.2.  Filtered  trapezoids 

The  ability  to  fill  trapezoids  is  sufficient  to  fill  arbitrarily  shaped  convex  polygons.  The  first 
subsection  presents  an  incremental  algorithm  to  fill  trapezoids.  This  algorithm  uses  the  universal  anti¬ 
aliasing  table  discussed  in  the  previous  chapter.  A  second  subsection  shall  discuss  parallel  updating 
techniques  to  fill  trapezoids. 


7.2.1 .  Incremental  algorithms  for  filling  trapezoids 

Section  6.1.2  presents  a  table  lookup  technique  that  can  be  used  to  determine  filtered  intensities 
for  pixels  that  intersect  trapezoids.  The  intensity  of  every  pixel  inside  a  trapezoid  can  be  determined 
usng  six  parameters,  die  distances  to  the  four  edges  and  the  angles  of  the  two  side  edges.  These 
parameters  arc  then  used  in  die  following  manner  to  compute  die  intensity  of  the  pixel. 

Pixel  In  tens  ity(yd1,yd2,pd1,pdz,/?1152) 

=  FJydj.pdj.flj)  -  Ft(yd1,pd2,(92)  -  Ft(ydz .  pdx . 
where  yd[  and  y/,  are  the  vertical  distances  to  the  lop  and  bottom  edges  respectively,  /x/j  and  pd^  arc 


the  perpendicular  distances  to  the  left  and  right  edges  respectively,  and  0 ^  and  0,  are  the  angles 
subtended  by  die  left  and  right  edges  respectively.  F  is  universal  polygon  table. 


Figure  7-5:  Polygon  parameters. 


An  incremental  algorithm  that  computes  these  six  parameters  top  to  bottom,  left  to  right  can  be 

used  to  fill  a  trapezoid.  The  angular  parameters  remain  constant  for  the  whole  trapezoid  (Figure  7-5). 

This  algorithm  does  not  restrict  any  of  the  endpoints  to  be  integers. 

:=  arctan((y1-y2)/(x1-x,)); 

02  :=  arctan((y1-y2)/(x3-x4)); 

yt  :=  round(y1)-l;  yb  :=  round(y2)+l; 

xl  :=  x1;  xr  :=  x,; 

xlinc  :=  (x2-x1)/(y2-y1); 

xrinc  :=  (x4-x3)/(y  -y^ ; 

pdinCj  :=  sinflj;  pdmc2  :=  -sin02; 

for  y  :=  yt  to  yb  do  begin 

yda  :=  y  -  y  ;  yd2  :=  y2  -  y; 

xl  :=  xl  +  xlinc;  xr  :=  xr  +  xrinc; 

pdj  :=  ( round(xl )-l-xl )*s i n0j ; 
pd2  :=  (xr-round(xr)-l)*§in02; 
for  x  :*  round(xl)-l  to  round(xr)+l  do  begin 
Shade  Pixel(x,y)  with 

Pixel  Intensity (yd1,yd2,pd1, pd2,^1,d2) ; 

pdj  ;=  pdj  +  pdinCjj  . 

pd2  :=  pd2  +  pdinc2; 

end ; 

end ; 


This  algorithm  docs  not  make  use  of  any  simpler  intersection  eases  between  each  pixel  and  die 
trapezoid.  The  x  and  y  loop  can  be  modified  to  do  that  by  treating  the  first  and  last  few  cases  in  hill 
generality  and  the  middle  elements  which  will  not  intersect  the  top  and  bottom  edges  as  a  special 


FI  I  TF.RFD  SCAN -CON  VERSION 


121 


7.2.2.  Filling  trapezoids  using  patches 

The  line  drawing  algorithms  use  small  strokes  to  fonn  lines:  we  can  similarly  use  small  patches  to 
form  trapezoids.  By  arguments  used  previously,  we  also  need  die  ability  to  update  a  square  area  in 
order  to  achieve  symmetric  updates  for  all  trapezoids.  As  during  line  drawing,  there  exist  two 
strategics  for  using  patches.  We  can  either  use  parallel  processing  to  compute  individual  patches  or 
use  precomputed  patches  to  assemble  the  desired  trapezoid.  We  shall  discuss  only  computed  patches; 
use  of  precomputed  patches  is  fairly  complicated  although  similar  to  the  precomputed  strokes 
technique  of  line  drawing  but  is  beyond  the  scope  of  this  thesis. 

7.2.2.I.  Computing  patches 

As  in  the  case  of  the  incremental  trapezoid  filling  algorithm,  the  universal  polygon  table  will  be 
used  to  compute  the  shades  of  individual  pixels.  The  distance  parameters  will  be  computed  using  die 
method  used  to  compute  distances  in  die  parallel  computation  algorithm  used  for  drawing  lines. 
Combining  this  parallel  technique  with  an  incremental  algorithm  which  spans  the  whole  trapezoid 
and  incrementally  computes  die  distances  to  the  edges  at  the  corners  of  each  patch  gives  die  following 
trapezoid  filling  algorithm. 


122 


0 a  :=  arctan((y1-y2)/(x1-x2)) ; 

Oz  :=  arctan((y1-y2)/(x3-x4))  ; 
yt  :=  round(y1)-l; 
yb  :=  round(y2)+l; 
xlinc  :=  (x2-x1)*N/(y2*y1) ; 
xrinc  :=  (x  -x3)*N/(y2-y1); 

xl  :=  round(Xj+xl inc)-l;  xr  :=  round(x2+xrinc)+l ; 

for  yc  :=  yt  to  yb  by  N  do  begin 
pdca  :=  yc*cos0x  -  xl^sin^; 
pdc2  :=  yc*cos 02  -  xl*sin<?2; 
for  xc  :=  xl  to  xr  by  N  do  begin 


The  following 

nested 

loop 

is  ex 

ecuted  in 

NxN 

process 

ors  } 

for 

x  :=  0 

to  ( N - 1 ) 

do 

for 

O 

11 

>> 

to  (N-l) 

do 

begin 

pdx  :  = 

y*co s0x 

-  X 

*sint?x 

+  pdcx; 

pd2  :  = 

y*cos 02 

-  X 

*sin02 

+  pdc2; 

ydt 

y  -  yt: 

yd2  :  = 

y2  -  y; 

Shade 

pixel  (x 

.y) 

with 

Pixellntensity(yd1,yd2,pd1,pd2, 01,02); 


pdca  :=  pdcx  -  N*sindx; 
pdc2  :=  pdc2  -  N*sin02; 
end; 

xl  :=  xl+xlinc;  xr  :=  xr+xrinc; 
end ; 


7.3.  Conclusion 

This  chapter  discussed  algorithms  to  draw  gray-scale  lines  and  fill  trapezoids.  All  the  algorithms 
presented  used  geometric  information  to  look  up  precomputed  tables  containing  filtered  pixel 
intensities.  For  both  the  tasks,  an  incremental  algoridmi  is  first  discussed  which  updates  the  display 
one  pixel  at  a  time.  These  algorithms  arc  modified  to  update  several  pixels  in  parallel.  The  parallel 
updates  use  two  different  methods:  the  pixels  can  either  be  computed  in  parallel  or  looked  up  from 
precomputed  "fonts"  of  line  strokes  and  trapezoidal  patches.  The  precomputed  font  technique  is 
easier  to  implement  for  high  speeds  but  might  not  be  feasible  if  the  number  of  primitives  is  too  large. 
Assuming  the  feasibility  of  a  few  hundred  primitives,  the  precomputed  font  technique  is 
implcmentablc  for  both  lines  and  trapezoids  with  the  restriction  that  the  corners  have  to  coincide 
with  pixel  centers.  If  the  corners  do  not  coincide  with  pixel  centers,  then  the  parallel  computation 
technique  is  inure  practical. 


K 


tjil 


FILTERED  SCAN-CONVFRSION  123 

This  chapter  also  restricts  itself  to  using  the  conical  filter  with  a  radius  of  1.  In  reality,  the  filter 
should  be  a  parameter  in  the  algorithms.  The  shape  of  the  filter  can  be  varied  by  varying  the  values  in 
the  filtering  tables.  Changing  die  extent  of  the  filter  requires  a  change  in  die  algorithms,  usually  by 
changing  the  upper  and  lower  limits  of  die  loops. 

The  display  design  presented  in  Chapter  9  can  be  used  to  implement  both  algorithms  although  die 
precomputed  font  algorithms  will  be  faster. 


IMAG1-:  I'ROCI'SSING 


125 


Chapter  8 
Image  Processing 


Image  processing  refers  to  the  manipulation  of  the  pixels  of  an  image  to  achieve  desirable 
transformations.  The  image  is  represented  by  a  two  dimensional  array  of  pixel  values.  Image 
processing  algorithms  process  this  array  and  perform  a  wide  range  of  transformations  on  the  image. 
Some  simple  transformations  like  translation,  scaling,  and  rotation  arc  used  to  present  the  image  in  a 
more  desirable  format.  Another  set  of  transformations  is  used  to  enhance  the  image  to  remove  defects 
due  to  blurring  or  motion,  or  to  improve  features  like  contrast  and  brightness.  Image  processing 
algorithms  arc  broadly  classified  into  three  classes:  point-by-point  transformations,  neighbor 
transformations,  and  Fouricr-domain  transformations. 

A  point-by-point  transformation  is  one  in  which  the  nesv  image  is  computed  by  transforming  each 
point  of  the  image  without  referring  to  any  other  point.  The  simplest  example  of  such  a 
transformation  is  adding  a  constant  to  each  pixel  of  the  image  to  increase  die  brightness  of  the  image. 
Similarly,  multiplying  each  pixel  by  a  constant  increases  or  decreases  the  contrast  of  die  image, 
depending  upon  whether  die  constant  is  greater  than  or  less  than  one.  Translation  by  an  integer  offset 
requires  copying  pixels  from  one  location  to  another,  and  scaling  by  an  integer  factor  requires 
copying  pixels  from  one  location  to  several.  Rotations  by  multiples  of  90  degrees  can  also  be 
performed  by  merely  moving  pixels  of  the  image.  These  point-by-point  transformations  and  several 
others  were  the  topic  of  discussion  in  Chapter  4  under  the  title  of  BitBlt. 

Simple  neighbor  transformations  arc  required  to  perform  some  image  processing  operations  like 
non-integer  translation  and  scaling,  and  noil-perpendicular  rotations.  Such  transformations  typically 
combine  the  pixel  intensities  of  a  few  neighbors  to  compute  die  pixel  intensity  of  the  new  image. 
These  operations  together  with  their  implementations  are  die  topic  of  die  first  section  of  this  chapter. 

Fouricr-domain  transformations  manipulate  the  image  to  achieve  transformations  in  the  frequency 
spectrum  of  the  image.  An  example  of  such  a  transformation  is  edge  enhancement,  which 
corresponds  to  amplifying  the  high  frequencies  of  the  image.  Ollier  Fouricr-domain  transformations 


126 


arc  used  to  remove  blurring  and  motion  defects  in  digitization.  Fouricr-domain  processing  can  be 
performed  by  convolving  die  image  with  the  Fourier-inverse  of  the  frequency  domain 
transformation.  Since  most  transformations  manipulate  the  higher  frequencies  of  the  image,  die 
convolution  window  is  fairly  small,  usually  a  3x3  or  5x5  or  7x7.  Convolution  is  the  subject  of  the 
second  section  of  this  chapter. 

8.1.  Neighbor  transformations 

This  section  discusses  the  algorithms  of  die  neighbor  transformations  for  non-integer  translations 
and  scaling,  and  for  non-perpendicular  rotation.  The  desire  for  such  transformations  is  motivated  by 
die  same  reason  as  the  desire  to  draw  lines  between  non-integer  endpoints,  which  is  to  portray  die 
grayscale  display  as  a  device  with  a  higher  virtual  resolution.  This  is  especially  important  in  moving 
images  where  the  quantization  to  integers  can  cause  additional  aliasing  in  the  form  of  jitter.  An 
example  of  such  aliasing  could  be  a  slow  moving  object  on  the  display.  If  the  object  moves  2  pixels  in 
20  frames,  and  if  the  move  is  quantized  to  1  pixel  every  10th  frame,  then  the  object  will  seem  to  jump 
on  the  display.  Having  the  ability  to  move  the  object  by  1/lOtli  of  a  pixel  in  each  frame  can  be  used 
to  smooth  the  movement. 

8.1.1.  Translation 

Translating  an  image  on  pixel  boundaries  is  equivalent  to  shifting  a  sampled  signal  by  a  distance 
diat  is  a  multiple  of  die  sampling  rate,  in  which  ease  die  values  of  die  individual  samples  remain  the 
same  except  diat  they  occur  at  a  different  spot.  This  operation  can  be  performed  by  copying  die 
samples  from  die  original  spot  to  the  shifted  spot  (Figure  8-1).  The  BUBlt  operator  defined  in 
Chapter  4  performed  such  an  operation  by  copying  a  rectangular  region  from  one  part  of  the  display 
to  another. 

Translation  by  sub-pixel  distances  is  equivalent  to  shifting  the  sampled  signal  by  a  distance  which 
is  not  a  multiple  of  die  sampling  rate.  This  problem  can  be  solved  by  reconstructing  the  original 
signal  and  sampling  it  again  at  the  shifted  intervals  (Figure  8-1).  To  find  an  algorithm  for  diis 
problem  we  should  look  at  the  sampling  theorem  and  its  implications. 

Nyquist's  sampling  theorem  states  that  a  band-limited  signal  with  a  maximum  frequency  yean  be 
sampled  at  a  spatial  interval  of  1/(2*/)  without  losing  any  information  contained  in  the  signal  which 
can  lienee  be  reproduced  faithfully.  Sampling  signals  that  contain  higher  frequencies  than  allowed  by 


l.VIAGI:  PROCESSING 


127 


Sampled  signal 


He 


— * 


Sampled  signal  shifted  by 
2T 


Z' 


-r 


r  t~ 


Sampled  signal  shifted  by 

1.5  T 


~1 


nr 


Figure  8-1:  Sampling  and  shifting. 


the  Nyquist  theorem  result  in  the  aliasing  of  the  higher  frequencies  as  lower  ones.  Figure  8-2  shows  a 
high  and  low  frequency  both  of  which  result  in  the  same  samples.  Low  pass  filtering  of  the  signal 
before  sampling  can  remove  die  aliasing  problem;  this  preprocessing  step  is  hence  called  anti¬ 
aliasing.  Ideally,  a  signal  which  is  going  to  be  sampled  at  time  interval  T should  be  prcfiltcred  using  a 
ideal  low  pass  filter  with  an  upper  cutoff  frequency  of  1/(2* 7).  This  corresponds  to  convolving  the 
signal  with  the  sine  function  (  =  sm{rr* x/T)/(v* x/T)),  which  is  the  Fourier  transform  of  the  ideal 
low-pass  filter.  The  resulting  signal  can  then  be  sampled  to  create  die  anti-aliased  samples. 


128 


sint—Ugli) 

•  ult  —  n ) 

r 

=  /  S(l)fU-n) 

I  =  -<x> 

Shown  in  Figure  8-3  arc  the  two  signals  that  have  to  be  convolved  to  compute  the  sample  S(n)  at  t  = 
n.  In  practice,  a  computationally  easier  filter  like  the  triangular  or  the  gausrian  filter  is  used  for  anti¬ 
aliasing  (Figure  6-3),  but  these  approximations  introduce  aliasing  error  by  allowing  some  high 
frequency  content  of  the  original  signal.  Also  some  of  the  low  frequency  is  lost  because  the  frequency 
response  of  the  filter  is  not  perfect  in  the  ideal  low-pass  range. 


Sin)  =  /  Sit) 


Figure  8-3:  Filtering  S(t)  to  compute  sample  at  n. 


This  set  of  samples  have  the  same  information  content  as  the  band-limited  signal 

.  ,  vit  —  «K 

oo  sin( - f - ) 

^ -..V<,,,-3Ear 

T 


'Hie  sine  function  is  hence  the  ideal  reconstruction  function  because  there  is  no  loss  of  information  in 
die  process.  This  yields  us  an  algorithm  that  can  be  used  to  time  shift  the  sampled  signal  by  an 
interval  which  is  not  a  multiple  of  the  sampling  rate.  The  sample  at  a  time  m  is  given  by 


IMAGE  PROCESSING 


129 


.  ,  vim  —  n)  > 

«  sin( - - - ) 

S(m)  =  2  Sin) - - — ~  r — 

T 

As  in  llic  ease  of  filtering,  this  form  of  reconstruction  requires  the  summation  of  an  infinite  sequence 
to  create  every  sample.  To  make  die  problem  computationally  tractable,  we  can  approximate  die  sine 
function  by  one  of  several  simpler  functions.  If  we  choose  the  triangular  function  then  the 
reconstruction  is  simply  a  linear  interpolation  of  adjacent  samples.  Shifting  a  sequence  by  half  die 
sampling  interval  can  lienee  be  performed  by  averaging  adjacent  samples.  The  shift  of  an  interval  of 
a/T can  be  done  by  computing 

S(n')  =  .S'(/i)(l  —  a)  + 

This  linear  interpolation  can  be  extended  to  two  dimensions  by  interpolating  four  adjacent  neighbors. 

lhc  linear  interpolation  algorithm  for  reconstructing  signals  from  their  samples  causes  a  loss  of 
information,  because  in  the  ideal  low-pass  band,  the  triangular  filter  does  not  have  a  linear  frequency 
response  (Figure  6-3).  Successive  interpolations  will  hence  blur  the  image  due  to  further  loss  of  high 
frequency  information.  However  the  first  few  interpolations  give  satisfactory  results. 

Figure  8-4  shows  die  effect  of  successive  interpolations  on  a  sampled' character.  The  lower  left 
corner  contains  adjacent  characters  "A"  created  by  filtering  a  high  resolution  representation  of  the 
character  [Warnock  SO],  In  die  pair  just  to  the  right  of  the  pair  in  the  lower  left  corner,  the  first 
character  is  an  identical  copy  of  the  original,  and  die  second  character  is  mov  ed  to  the  left  by  l/16th 
of  a  pixel  by  interpolating  the  original.  The  pairs  further  to  right  have  one  original  and  the  second 
character  moved  horizontally  2/16  pixel,  3/16,  ...  and  so  on  until  the  second  character  of  die  last  pair 
is  moved  by  one  full  pixel.  All  moves  arc  made  by  successive  interpolations,  i.c.  the  character  with  an 
offset  of  3/16  is  created  by  moving  the  character  with  an  offset  of  2/16  by  a  distance  of  1/16.  The 
vertical  column  on  the  left  is  similar  to  the  bottom  row  with  the  difference  dial  the  second  character 
is  moved  vertically  instead  of  horizontally.  'Hie  other  rows  and  columns  arc  moved  both  vertically 
and  horizontally  with  horizontal  displacement  determined  by  the  column  number  and  die  vertical 
displacement  determined  by  die  row  number.  Knell  pair  of  characters  is  created  by  interpolating  the 
pair  on  its  left,  and  if  there  is  no  set  on  the  left  then  the  set  below  is  used.  Hie  figure  shows  that  the 
characters  become  unacceptable  after  approximately  five  or  six  successive  interpolations. 

Figure  8-5  shows  the  same  character  placements  as  Figure  8-4  with  the  difference  that  now  all 
characters  with  sub-pixel  positioning  are  created  from  the  master  copy  located  at  die  lower  left 
corner.  The  result  of  this  experiment  is  that  interpolation  is  a  satisfactory  w  ay  of  translating  images  by 


130 


sub-pixel  distances  if  only  the  master  copy  is  used.  Successive  interpolation  will  blur  the  image  and 
render  it  unsatisfactory. 

One  practical  implication  of  the  interpolation  technique  is  that  grav-scale  fonts  need  only  to  store 
one  character  positioned  at  a  pixel  center;  interpolation  can  be  used  to  get  sub-pixel  positioning. 
Interpolation  hence  reduces  font  space  used  by  text  and  graphics  fonts.  (The  graphics  fonts  arc  used 
to  store  precomputed  strokes  for  lines  and  patches  for  trapezoids.) 

The  interpolation  algorithm  relics  on  the  linear  reconstruction  filter.  Higher  order  filters  will 
result  in  smaller  distortion;  the  sine  filter  gives  the  ideal  reconstruction.  Such  filters  rely  on 
computing  a  function  of  more  than  two  neighbors  and  arc  hence  more  expensive,  but  also  result  in 
negative  values  for  some  samples  which  are  very  hard  to  interpret  and  show  on  displays! 

Parallel  implementations  of  die  interpolation  algorithm  require  access  to  both  rows  and  columns  in 
order  to  achieve  a  symmetrical  speedup.  The  square  memory  organization  provides  such  access.  The 
procedure  used  is  identical  to  the  BiiBlt  procedure  with  the  operation  replaced  by  the  interpolation  of 
adjacent  pixels.  The  neighbor  connections  arc  used  to  access  the  values  of  neighboring  pixels.  Since 
any  of  eight  neighbors  might  be  required,  the  tri-state  neighbor  connections  (Figure  4-14)  arc  most 
suitable  far  this  application. 

8.1.2.  Scaling 

Scaling  is  a  commonly  used  operation  to  view  an  image  at  different  sizes.  Integer  scaling  can  be 
performed  easily  by  replicating  pixels.  Such  an  operation  is  often  not  sufficient,  especially  in  motion 
sequences  where  gradual  scaling  is  required.  Non-integer  scaling  redistributes  the  intensity  of 
individual  pixels.  The  scaling  problem  can  be  translated  into  an  equivalent  signal  processing  problem 
of  reconstructing  the  original  from  its  samples,  scaling  the  original  signal,  and  then  resampling  the 
settled  signal.  Figure  8-6  shows  the  scaling  of  a  signal  by  a  scale  of  less  than  and  greater  than  one.  The 
figure  uses  the  triangular  reconstruction  for  simplicity  instead  of  the  ideal  sine  reconstruction 
function.  This  leads  to  an  obvious  onc-diincnsional  algorithm  for  scaling  an  image  with  /;  pixels  by  a 
factor  of  a. 

for  x  :=  0  to  fa*n]  do  begin 
xinv  :=  x/a; 

xi  :=  [x/otj;  xf  :=  x/a  -  xi; 

I  ’  [ x ]  :=  I[xi  ]*( 1-xf )  +  I[xi  +  l]*xf ; 
end ; 

'Ibis  algorithm  uses  real  representations  of  a.  xinv,  and  xf.  An  integer  algorithm  can  be  formulated  if 


v^l 


IMAGE  PROCESSING 

131 

AA 

AA 

A  A 

AA 

A  A 

A  A 

AA 

AA 

A  A 

A  A 

AA 

A  A 

A  A 

A  A 

A 

> 

> 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

A  A 

A  A 

AA 

AA 

AA 

A  A 

A 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

AA 

A  A 

AA 

AA 

A  A 

AA 

AA 

A 

AA 

AA 

A  A 

AA 

A  A 

AA 

A  A 

AA 

AA 

AA 

A  A 

A  A 

AA 

AA 

A 

A  A 

AA 

AA 

A  A 

A  A 

A  A 

A  A 

A  A 

AA 

AA 

AA 

A  A 

A  A 

A  A 

A 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

A  A 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

A 

AA 

AA 

AA 

A  A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

A  A 

AA 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

AA 

A  A 

A  A 

A 

AA 

A  A 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

A  A 

A  A 

A  A 

AA 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

A  A 

Art 

A  vrt 

,4 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

AA 

AA 

AA 

A  A 

AA 

AA 

A  A 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

A  A 

h> ,  \ 

A  A 

A 

AA 

AA 

AA 

AA 

AA 

AA 

AA 

A  A 

A  A 

A  A 

AA 

AA 

AA 

A  A 

A 

A  t< 

Figure 

;  3-4:  Ch 

aractcr  dragged  hv  successhc  move: 

3. 

AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
A  A  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 

9 

AA  AA  AA  AA  AA 
AA  AA  A  A  AA  AA 
A  A  A  A  A  A  A  A  A  A 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 


AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
A  A  A  A  A  A  A  A  A  A 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  A  A  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
A  A  A  A  A  A  A  A  A  A 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
A  A  A  A  A  A  A  A  A  A 
A  A  A  A  A  A  A  A  A  A 


A  A  A  A  A  A  A  A  A  a 
A  A  A  A  A  A  A  A  A  A 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  A  A 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
AA  AA  AA  AA  AA 
A  A  A  A  A  A  A  A  A  A 
A  A  A  A  A  A  A  A  A  A 
AA  AA  AA  AA  A  A 
A  A  A  A  A  A  A  A  A  A 
A  A  A  A  A  A  A  A  A  A 
A  A  AA  A  A  AA  A  A 


Figure  8-5:  Character  moved  from  master  copy. 


IMAGE  PROCESSING  133 

wc  assume  a  to  be  representable  as  Traction  pi q  of  small  integers,  'lbe  scaling  problem  can  now  be 

restated  as  the  redistribution  of  p  parts  of  each  original  sample  into  q  parts  of  each  scaled  sample 

(Figure  8-7)  (Lucas  81].  This  algorithm  can  be  stated  as 

for  x  :=  0  |’p*n/q]  do 
r[x]  :=  0; 

for  i  :=  0  to  p*n  do 

I  ’  [i/q]  ==  I’[i/q]  +  I[1/p]/p: 


scale  >  1 


scale  <  1 


Figure  8-6:  Scaling  an  onc-ditnesional  signal. 


m,° 


Figure  8-7:  Scaling  using  integers. 

Parallel  implementations  of  the  scaling  algorithm  arc  very  hard  because  there  does  not  exist  a  one- 
to-one  mapping  from  the  pixels  in  the  source  image  to  pixels  in  the  destination  image.  I  low  ever,  if  we 
are  scaling  a  two-dimensional  image  in  only  one  dimension,  then  we  can  operate  on  pixels  in  the 
other  dimension  in  parallel.  The  .V.x.V  memory  organization  w  ill  pirn  ide  a  .Y-fold  speedup  for  images 
scaled  in  only  one  of  two  dimensions.  Scaling  transformation  for  both  dimensions  can  make  two 
passes  over  the  image,  one  to  scale  vertically  and  one  to  scale  horizontally. 


134 


8.1.3.  Rotation 

The  BitBll  discussion  in  Chapter  4  discussed  rotations  by  multiples  of  90  degrees  and  memory 
architectures  to  implement  them.  These  operations  are  similar  to  integer  translation  and  scaling 
because  they  can  be  performed  by  copying  pixel  intensities.  Rotations  by  other  angles  not  only  cause 
the  pixels  to  ect  mapped  in  a  nondinear  way,  but  also  require  the  intensity  %  allies  to  be  redisti  ibuwd. 
Fisure  S-S  shows  the  mapping  of  destination  pixels  to  the  source  region  when  an  image  is  ictatcd  by 
45  degrees.  The  dots  show  the  locations  of  the  original  pixel  array,  and  the  circles  show  the  locations 
of  the  rotated  pixel  array.  Linear  interpolation  can  be  used  to  compute  the  intensities  of  pixels  that  do 
not  lie  on  the  source  pixel  grid. 


o 

O 

< 

> 

o 

o 

O 

0 

o 

9 

© 

O 

o 

IV 

rt 

o 

& 

o' 

w 

w 

O 

® 

( 

•o 

o 

o 

Figure  8-8:  Rotation  by  45  degrees. 

Rotation  can  be  performed  by  using  a  combination  of  shearing  and  scaling  tuns  formations.  A 
shearing  transformation  maps  a  rectangular  image  into  a  parallelogram,  leaving  the  orientation  of  one 
of  tltc  parallel  edges  unchanged  (Figure  8-9).  These  transformations  arc  described  by  characteristic 
matrices  which  2x2  matrices  used  to  multiply  the  old  (,v,y)  coordinates  to  obtain  the  new  coordinates. 
Scaling  transformations  have  characteristic  matrices  ol  the  form 


(horizontal)  and 


(vertical). 


and  shearing  transformations  have  characteristic  matrices  ot  the  form 


1  v  10  . 
q  j  ( vertical )  and  j  (horizontal). 


1M  AG  li  PROCESSING 


1 35 


Vertical  Horizontal 

Shearing  Shearing 

Figure  8-9:  Shearing. 

Shearing  and  scaling  can  be  used  in  combination  to  rotate  an  image  with  an  arbitrary  angle  r, 
because 

1 

cos  r  sin  /•  1  tan  r  1  0  10  cos  r  0 

-sin  r  cos  r  ~  0  l  -sin  r  cos  r  1  0  cos  r  [0  1 

Carl  F.R.  Weiman  [Weiman  80]  presents  an  algorithm  to  implement  the  shearing  transformation 
by  using  a  code  devised  and  studied  by  Roihstein  [Rothstcin  79]  for  a  transformation  representable  as 
p/q.  Rothstein’s  code  is  a  binary  sequence,  the  digits  of  which  correspond  to  jumps  in  a  bit-map  line 
with  slope  q/p.  The  code  will  have  a  1  when  this  line  jumps  and  a  0  whenever  it  docs  not.  As  an 
example,  the  code  for  4/3  will  be  (01 11)  when  the  origin  of  the  line  corresponds  to  a  pixel  center.  1'or 
lines  whose  origin  do  not  correspond  to  pixel  centers,  similar  codes  can  be  constructed.  Other 
possible  codes  for  4/3  arc  (1 110).  (1 101),  and  (101 1).  The  algorithm  performs  the  shearing  by  shifting 
up  each  column  (for  vertical  shearing)  where  the  code  is  1,  doing  this  for  every  possible  code,  and 
averaging  die  results  (Figure  8-10  which  is  taken  from  [Weiman  80]). 

Weiman's  algorithm  has  a  Uitlilt  implementation  in  which  each  column  (again  for  vertical 
shearing)  gets  added  to  its  destination  followed  by  dividing  die  intensity  of  each  in  the  result  by  q. 


IMAGE  PROCESSING 


137 


8.2.  Convolution 

A  transformation  which  changes  the  high-frequency  characteristics  of  an  image  can  be 
implemented  as  a  convolution  of  the  image  with  a  set  of  coefficients  which  arc  zero  everywhere 
except  in  a  small  window.  Such  convolutions  can  be  used  for  operations  like  smoothing,  edge 
detection  and  enhancement,  spatial  reduction,  or  any  other  operation  which  changes  only  the  high 
frequency  characteristics  of  the  image.  We  shall  discuss  edge  detection  as  an  example. 

ltdge  detection  is  used  during  image  processing  for  picture  segmentation.  It  segments  the  picture 
into  regions  based  on  the  detection  of  discontinuities  in  the  intensity  level.  Such  a  discontinuity  is 
called  an  edge  and  there  are  several  techniques  used  to  detect  such  edges.  Most  of  them  use 
derivative  operators,  which  compute  the  gradiant  of  the  image  intensity,  and  hence  give  high  values 
at  points  where  the  intensity  level  of  die  picture  is  changing  rapidly.  One  of  the  popular  digital 
gradient  approximations  is  due  to  Roberts  [Roberts  65],  which  computes  the  follow  ing  function  of 
each  pixel  ( x,y ) 

maxfl /(*,>')-  /(-v  +  l.>-+  l)[,[/Or+ 1  ,y)~Ax,y+  01). 

This  function  relies  on  the  fact  that  differences  in  any  pair  of  perpendicular  directions  can  be  used  to 
compute  the  gradient. 

A  parallel  implementation  of  Roberts’  function  or  any  3x3  convolution  requires  each  pixel  to 
access  its  nearest  eight  neighbors.  It  can  read  these  values  and  combine  them  with  its  own  to  compute 
the  function.  This  parallel  approach  will  work  even  with  Ax  A'  squares  (if  A'  >  3).  the  only  problem 
being  that  pixels  on  the  edges  cannot  get  access  to  all  their  neighbors.  This  can  be  solved  by 
overlapping  successive  squares  by  one  pixel  in  both  directions. 

An  outline  of  the  algorithm  to  perform  Roberts’  function  follows.  Place  the  initial  A.xA  square  at 
the  top  left  corner  of  the  image.  All  pixels  except  the  ones  on  the  lower  and  left  edges  will  compute 
the  convolution  function.  Move  the  /Vx A’  square  right  by  (A7-l)  pixels  and  recompute.  Repeat  until 
the  right  edge  has  been  reached.  Now  place  the  AxA  square  at  the  left  edge  (A’-l)  pixels  below  the 
previous  left  edge  placement.  Repeat  these  two  nested  loops  until  the  whole  image  has  been 
processed.  Notice  th.it  this  algorithm  is  an  optimization  of  the  most  obvious  algorithm  which  overlaps 
the  squares  by  two  pixels  and  performs  no  computation  for  all  edge  pixels.  This  algorithm  can  be 
easily  extended  for  larger  windows. 

Convolution,  like  other  image  processing  applications  relies  on  neighbor  interconnections  for 


138 


8.3.  Conclusion 

This  chapter  presented  algorithms  for  a  few  low  level  image  processing  applications  that  can  be 
implemented  effectively  for  the  A'x/V  square  memory  organization.  The  intent  of  this  presentation  is 
not  to  present  the  best  algorithms  for  these  applications,  but  to  show  that  the  square  memory 
organization  is  indeed  effective  for  image  processing  in  addition  to  being  well  suited  for  BitBlt  and 
graphics. 


DISPL  AY  DESIGN 


139 


Chapter  9 
Display  Design 


The  architectures  and  algorithms  developed  in  this  thesis  lend  to  one  conclusion:  die  critical 
component  of  a  display  design  is  the  underlying  memory  system.  The  memory  system  design 
determines  die  upper  limit  of  the  speed  at  which  the  display  can  be  updated.  This  thesis  discusses 
several  memory  organizations;  the  symmetric  square  organization  proves  to  be  most  effective  to 
provide  parallel  updates  for  BitBIt,  graphics,  and  die  image  processing  applications. 

The  MxM  square  memory  organization  is  defined  to  use  M~  memory  chips  which  can  be  read  or 
written  simultaneously.  But  the  term  memory  chip  has  always  been  used  loosely.  In  Chapter  2,  each 
memory  chip  could  compute  addresses  from  the  location  of  the  display  region  being  accessed.  It 
could  also  compute  masks  to  disable  a  subset  of  the  memory  drips.  In  Chapter  4,  die  memory  chips 
were  able  to  communicate  data  to  other  memory  chips.  They  could  also  perform  simple  aridimctic 
and  logic  operations  required  by  BitBIt.  In  Chapter  7.  each  of  the  memory  chips  contained  a 
processor  which  could  perform  distance  computations  and  use  these  distances  to  lookup  anti-aliasing 
tables  to  determine  pixel  intensities.  Finally,  in  Chapter  8,  we  abandoned  the  hope  of  each  memory 
chip  having  only  a  small  special  purpose  processor  by  providing  fairly  general  operations. 

None  of  diese  memory  chip  requirements  is  met  by  any  commercial  memory  part.  One  possible 
structure  of  a  smart  display  memory  chip  dial  meets  these  requirements  is  shown  in  Figure  9-1.  Since 
each  memory  chip  is  defined  to  have  the  ability  to  read  or  write  one  pixel  in  every  memory  cycle,  die 
heart  of  die  smart  display  memory  chip  is  a  pixel  RAM.  The  pixel  RAM  is  a  conventional  random 
access  memory  which  provides  as  many  bits  in  each  memory  cycle  as  the  number  of  bits  in  each  pixel. 
The  processor  associated  with  die  pixel  RAM  has  exclusive  control  of  the  address  and  write  enable 
wires  of  the  RAM  and  exclusive  access  to  the  data.  The  interprocessor  communication  wires  provide 
the  data  communication  required  by  BitBIt  and  the  image  processing  applications.  The  video  buffer 
aids  the  generation  of  video  and  is  discussed  in  Section  9.1.1. 

Present  technological  limitations  do  not  allow  the  fabrication  of  a  smart  display  memory  chip  as  a 


Video  Commands 


Figure  9*1:  Smart  display  memory  chip. 

single  chip.  However  the  RAM  can  be  separated  from  the  rest  of  the  chip  as  shown  in  Figure  9-2. 
This  allows  the  use  of  commercial  RAM  parts  and  still  requires  only  one  chip  to  be  custom  designed. 
The  processor  chip  used  in  such  a  scenario  is  referred  to  as  the  display  chip. 

This  chapter  discusses  the  design  of  a  display  memory  system  for  the  Sx8  memory  organization. 
Sections  9.1,  9.2,  and  9.3  explore  die  design  space.  Section  9.4  describes  DC hip,  a  display  chip 
designed  for  such  a  display  system.  The  DC  hip  design  is  optimized  for  BitBlt,  although  it  is  also  able 
to  perform  the  computations  for  the  other  operations  discussed  in  this  thesis.  Section  9.5  presents 
simulated  instruction  sequences  of  how  DChip  can  perform  various  operations.  These  simulations  arc 
dicn  used  to  predict  the  performance  for  Uicsc  operations.  Section  9.6  discusses  issues  about  sealing 
the  design  of  DChip  with  respect  to  various  parameters. 

9.1.  Memory  organization 

The  8x8  memory  organization  has  several  implementation  choices.  The  8x8  squares  can  be 
staggered  (see  Section  2.3)  to  allow  the  access  of  both  8x8  squares  for  the  update  operations  and  64x1 
spans  for  the  video  scanning.  This  choice  is  discussed  in  Section  9.1.1.  To  provide  effective 
bandwidth  utilization  for  both  BitBlt  and  precomputed  graphics  algorithms,  we  choose  to  provide  the 
ability  to  update  arbitrarily  aligned  8x8  squares.  We  shall,  however,  restrict  ourselves  to  updating 


DISPLAY  DF.SIGN 


141 


1  Display  Chip 


Figure  9-2:  Display  chip  with  4  RAM  chips  for  a  4  bit  gray-scale  system. 

contiguous  display  areas.  As  a  result  different  parts  of  the  memory  array  will  receive  different 
addresses.  The  various  choices  for  computing  the  memory  addresses  arc  discussed  in  Section  9.1.2. 

Jim  Clark  and  Marc  Hannah  use  an  addressing  scheme  in  which  each  memory  chip  receives  an 
independently  computed  address,  as  a  result  of  which  the  display  area  accessed  is  not  necessarily 
physically  contiguous  [Clark  SO],  Their  scheme  results  in  a  design  which  is  subtantially  more 
complicated  than  the  one  discussed  in  this  chapter. 

9.1.1.  Screen  refresh 

In  one  memory  access  the  unstaggered  8x8  organization  provides  only  S  pixels  of  the  scan  line 
being  scanned.  Because  of  the  desire  to  waste  the  smallest  fraction  of  bandwidth  for  the  video 
scanning  process,  we  have  to  save  the  remaining  56  pixels  in  a  separate  piece  of  memory.  This 
memory  is  called  the  video  buffer,  and  contains  storage  for  at  least  8  complete  scan  lines.  Having 
storage  for  only  8  scan  lines  forces  the  video  controller  to  carefully  time  the  writing  and  reading  of 
the  video  buffer,  which  may  be  very  difficult  if  the  access  time  of  the  video  buffer  is  only  slightly 
higher  than  8  times  the  video  rate.  1  lowever,  if  the  video  buffer  has  storage  for  16  scan  lines,  then  the 
video  controller  can  write  one  set  of  8  scan  lines  while  reading  the  other  (i.e.,  double  buffering).  Also, 
the  frame  buffer  can  be  read  in  a  burst  which  accesses  all  the  8  scan  lines  at  once  or  in  smaller  bursts. 


142 


Burst  access  of  the  unstaggered  8x8  organization  along  the  scan  line  direction  requires  only  the 
column  address  to  be  incremented  by  one  (Section  2.2).  This  allows  the  use  of  page  mode  provided  by 
commercial  RAMs,  which  saves  a  factor  of  two  in  the  frame  buffer  bandwidth  used  for  video 
scanning.  To  avoid  concurrent  access  of  the  video  buffer  for  reading  and  writing,  1/8  of  the  next  8 
scan  lines  can  be  written  during  each  horizontal  retrace  because  no  reads  are  required  during  that 
period.  In  such  an  implementation  only  1/8  of  the  total  number  of  8x8  square  along  a  scan  line  arc 
read  during  each  frame  buffer  burst  access. 

Only  8  pixels  out  of  die  64  available  during  die  access  of  die  video  buffer  arc  used  to  generate  die 
next  8  pixels  of  video.  This  implies  the  use  of  multiplexing  to  select  die  appropriate  8-pixcl  row.  The 
multiplexing  can  be  implemented  by  using  8  external  8x1  multiplexers7.  The  presence  of  output 
enables  on  the  outputs  of  the  video  provides  the  same  multiplexing  without  the  use  of  external 
multiplexers.  The  video  generation  from  the  video  buffer  can  be  implemented  as  shown  in  Figure  9- 
3.8 


OE  Row  0 
OE  Row  1 
OE  Row  2 
OE  Row  3 


Data  Out 


OE 


Figure  9-3:  Video  generation. 


The  8x8  display  implemented  the  video  buffer  using  RAMs.  although  it  docs  not  need  die 
generality  of  random  access.  Both  the  reading  and  writing  arc  serial  in  the  same  direction,  which 
suggests  the  use  of  shift  register  memory.  Figure  9-4  shows  a  double  rail  shift  register,  64  copies  of 
which  can  be  used  as  the  video  buffer.  It  consists  of  two  dynamic  shift  registers  which  shift  in 


7 

Assuming  single  bit  video.  If  the  video  has  several  bits  of  grayscale,  then  these  numbers  are  multiplied  bv  die  number  of 
bits  in  the  gray-scalc. 

ft 

Ihc  figure  only  shows  a  4x4  square,  but  is  easily  extended  io  nil  8x8. 


DISPLAY  DhSICN  l-!3 

opposite  directions.  The  length  of  each  is  1/S  of  the  length  of  the  scan  line.  The  data  in  the  Input 
Shift  Register  can  be  transferred  in  parallel  to  tire  Output  Shift  Register.  Only  the  input  to  the  Input 
Shift  Register  and  the  output  of  the  Output  Shift  Register  arc  accessible.  This  buffer  is  used  by 
shifting  in  data  serially  into  the  Input  Shift  Register,  transferring  till  the  data  to  the  Output  Shift 
Register,  and  then  shifting  it  out  in  the  reverse  order.  To  use  it  as  a  video  buffer  the  scan  line  would 
be  shifted  in  in  reverse  order,  transferred  to  the  Output  Shift  Register,  and  read  out  as  video.  The 
reason  for  tire  shift  registers  to  run  in  opposite  directions  is  to  use  the  buffer  for  displays  with 
different  scan-line  lengths.  If  tire  size  of  the  scan  line  is  smaller  than  the  maximum,  we  can  just  ignore 
the  unused  part  of  the  shift  register.  The  video  buffer  can  be  fabricated  as  part  of  the  display  chip  and 
requires  a  total  of  »  t- 1  extra  pins,  where  g  is  the  number  of  bits  in  the  gray  scale.  Fven  though  a  shift 
register  may  be  used  instead  of  RAM  as  the  video  buffer,  a  fairly  large  area  lias  to  be  devoted  to  its 
implementation.  For  a  g-bit  gray  scale  system  with  76S  pixel  scan  lines,  the  video  buffer  stores  12" 
kilobits,  or  192g  bits  in  every  display  chip. 


Input  Shift  Register 


Data  In 


Transfer 


Data  Out 


Clock  In 


Clock  Out 


Output  Shift  Register 


Figure  9-4:  Video  buffer. 


The  staggered  addressing  allows  the  access  of  64  horizontal  pixels,  which  eliminates  the  need  for 
the  video  buffer  (see  Section  2.3).  The  multiplexing  shown  in  Figure  9-3  can  still  be  used  to  generate 
the  video.  Burst  accesses  of  the  frame  buffer  cannot  take  advantage  of  the  page  mode  memory  access. 
Burst  access  may  still  be  useful  for  the  overall  system  design  because  it  implies  a  lower  frequency  of 
swapping  between  the  update  and  video  refresh  tasks.  For  example,  each  display  chip  could  have  a 
12-pixcl  FIFO  buffer  which  could  be  filled  during  a  burst  access  of  the  frame  buffer  and  used  for 
refreshing  a  76C  pixel  scan  line. 

To  effectively  utilize  the  memory  bandwidth  of  any  organization,  vve  have  to  implement  fast 
.address  computations,  i'.cisc  of  ininlnnciifoliun  for  the  address  compuuitions  couscd  the  choice  of'  the 


144 


unstaggercd  square  organization.  This  organization  forces  the  tisc  of  a  video  buffer,  but  also  offers 
higher  update  bandwidth  because  of  the  ability  to  use  page  mode  accesses  for  screen  refresh.  The 
discussion  in  the  rest  of  this  chapter  refers  to  the  unstuggered  SxS  organization. 

9.1 .2.  Memory  addressing 

Allowing  for  the  acccsss  of  arbitrarily  aligned  8x8  squares  implies  the  addressing  of  different  parts 
of  the  memory  chip  army  with  different  addresses.  If  die  display  is  built  using  smart  display  memory 
chips,  then  the  address  computation  can  be  performed  on-chip  and  the  externa!  world  would  provide 
only  tire  location  of  the  square  being  accessed.  We  are,  however,  going  to  use  commercial  memory 
with  display  chips ,  and  have  two  alternatives  for  generating  addresses  for  the  memory.  We  can  either 
use  die  display  chips  to  generate  the  addresses  or  alternatively  generate  the  addresses  outside  the 
display  chips  and  directly  drive  die  address  wires  of  die  memory  chips.  The  first  alternative  is 
identical  to  die  smart  display  memory  approach. 

If  die  addresses  for  die  memory  chips  arc  generated  externally,  and  the  display  chips  arc  used  only 
for  data,  then  there  arc  several  possible  alternatives  to  implement  addressing,  'flic  most  obvious 
implementation  of  die  addressing  requires  64  sets  of  address  vvires  and  a  large  amount  of 
multiplexing  hardware  to  select  the  appropriate  address  on  these  wires.  However,  a  simple  space¬ 
time  tradeoff  can  implement  this  addressing  more  economically  by  using  eight  sets  of  address  wires 
bussed  down  the  columns  of  the  memory  chip  array  (Figure  9-5).  If  each  memory  chip  is  a  64K. 
RAM,  dien  cacli  address  bus  has  eight  address  w  ires  to  carry  either  die  row  or  the  column  addresses, 
'flic  RAS  (Row  Address  Strobe)  lines  arc  bussed  across  the  rows  and  the  CAS  (Column  Address 
Strobe)  lines  arc  bussed  down  columns.  Three  addressing  steps  arc  required  to  provide  each  chip 
w-itli  the  correct  address.  To  address  the  square  located  at  (,v*8  4-  /,_y*S  4-  /),  the  Hist  step  places  y  on  all 
the  address  busses  and  strobes  the  RAS  lines  on  (S  —j)  rows  from  the  bottom.  The  second  step  places 
y+  1  on  all  the  address  busses  and  strobes  the  RAS  lines  on  j  rows  from  the  top.  Now  all  die  chips 
have  die  correct  row  address.  The  third  step  places  jc+  1  on  the  first  (S  —  i)  address  busses  and  .v  on 
die  rest  of  die  i  address  busses  and  strobes  die  CAS  lines  on  all  die  chips.  This  implementation  uses 
only  8  address  busses  instead  of  64  hut  has  to  use  three  cycles  instead  of  the  usual  tw  o  to  deliver  a  full 
address.  I  bis  mechanism  was  used  in  the  8x8  display. 

Alternatively,  die  location  of  the  square  to  be  accessed  can  be  broadcast  to  all  the  display  chips 
which  can  then  individually  compute  the  row  and  column  addresses  using  the  following  algorithm 
discussed  Section  2.2.  Assuming  (.v,r)  is  the  location  of  die  top  left  pixel  of  the  8x8  square,  then: 


146 


C/l(x+8./) :  =  CA(x.y)  +  1. 
and  the  address  for  (x.y+S)  can  updated  by 
7?,l(x.v+8)  :=  RA(x.y)  +  1. 

The  discussion  of  BitBlt  in  Chapter  4  showed  that  large  area  Hit  Bits  udpatc  squares  in  close 
proximity  to  each  other.  The  easy  address  updates  for  such  squares  imply  that  the  initial  address 
computing  procedure  can  be  slow  and  may  indeed  be  performed  as  a  sequence  of  instructions.  But 
small  area  HitHlis  (e.g.  characters)  and  the  graphics  algorithms  that  use  precomputed  strokes  and 
patches  still  require  fast  address  computations. 

The  display  chip  can  use  a  hardwired  circuit  to  compute  the  row  and  column  addresses.  The 
computing  of  row  and  column  addresses  use  identical  algorithms,  and  because  tit  esc  addresses  are 
used  at  different  times,  one  circuit  will  suffice  to  compute  both. 

An  advantage  of  storing  x  and  y  in  all  display  chips'  is  the  increase  in  performance  w  hen  only  one 
of  the  parameters  changes  although  this  is  more  expensive  because  each  display  chip  devotes  space 
for  storing  duplicate  information.  A  different  possibility  is  to  store  the  x  and  externally  and 
broadcast  them  to  tire  display  chips  only  when  the  addresses  arc  computed.  Figure  9-6  shows  this 
technique  in  the  form  of  a  block  diagram.  Although  the  algorithm  is  insensitive  to  whether  the  row  or 
the  column  address  is  being  computed,  a  single  bit  of  information  is  required  to  determin  '  whether  to 
use  the  row  or  the  column  number  in  the  comparison.  A  simplification  of  this  scheme  performs  die 
comparison  externally  and  instructs  each  display  chip  whether  or  not  to  increment  the  address 
(Figure  9-7).  During  the  column  address  phase,  all  display  chips  in  the  same  column  receive  the  same 
value  of  Chic.  Similarly  during  the  row  address  phase,  all  display  chips  in  the  same  row  receive  the 
same  value  of  Rlnc.  1'hc  Chic  control  pins  can  iicncc  be  bussed  vertically  and  the  RInc  control  pins 
can  be  bussed  horizontally  (Figure  9-S).  During  the  column  address  phase,  the  ROM  on  the  top  of 
the  chip  army  will  use  the  value  of  xcc8  to  compute  the  values  of  the  Rhic  wires  that  arc  enabled 
vertically  to  the  display  chips.  Table  9-1  tabulates  the  contents  of  this  ROM.  The  same  operation  is 
executed  by  die  ROM  on  the  left  edge  during  the  row  address  phase. 


DISPLAY  DI-SIGN 


147 


X 


SL 


CA :  =  x/8; 

i :  =  x%8; 

if  (c  <  i)  then  CA  : 

-  CA  +  1; 

' 

' 

CA 


Figure  9-6:  Address  computation  hy  each  display  chip. 


Cine 


x/8 


Figure  9-7:  Simplified  address  computation. 


9.1.3.  Masking 

F.ach  display  chip  controls  the  write-enable  signals  of  its  memory  chips.  This  allows  selective 
update  in  two  different  ways.  First,  any  subset  of  the  display  chips  may  enable  their  memory  chips  to 
update  only  part  of  an  8:<8  square.  Second,  each  display  chip  may  enable  a  subset  of  its  memory  chips 
to  selectively  update  only  a  subset  of  the  i  it  planes.  In  order  to  reduce  pin  count  of  tlu  display  chip, 
the  second  capability  has  been  eliminated  from  the  present  design.  However,  selective  update  of  bit 
planes  is  still  possible  b\  using  a  read-modily  w  rite  cycle. 


DISPLAY  DESIGN 


149 


The  rectangular  masks  can  be  forced  to  be  aligned  with  at  least  one  of  the  coincis  of  die  S.\8  square 
because  the  memory  organization  allows  the  access  of  arbitrarily  aligned  squares.  As  shown  in  Figure 
9-9,  die  8x8  mask  can  be  formed  by  OKing  two  8-bit  masks  (called  xniask  and  ymask ).  one  for  each 
dimension.  The  8  bit  xniask  contains  zeroes  in  the  columns  that  have  to  be  enabled  and  ones  in  the 
columns  to  be  disabled9.  Because  the  masks  arc  used  only  for  corner  aligned  rectangles,  the  zeroes 
are  contiguous  and  flushed  either  to  die  lelt  or  die  right.  I  here  arc  16  such  masks  for  each  dimension, 
a  total  of  32  masks.  These  masks  have  to  be  aligned  with  the  display  memory’s  word  boundaries.  For 
example,  if  the  update  is  not  aligned  with  8x8  boundary,  dicn  die  top  left  corner  bit  of  the  mask  has 
to  be  used  in  the  chip  that  contains  die  top  left  corner  of  the  square  being  updated. 


0 

0 

1 

ymask  1 
1 
1 
1 
1 


Figure  9-9:  Masking  8x8  squares  using  xniask  and  ymask. 

As  in  the  ease  of  addresses,  diese  masks  can  be  computed.  Four  parameters  are  needed  to  compute 
each  mask:  die  count  of  zeroes  in  each  direction  and  the  boundary  of  I  sets  oi  the  square  being 
updated.  The  mask  computing  algorithm  will  not  be  discussed;  instead  a  taolc  lookup  technique  is 
used  to  access  the  precomputed  masks  stored  in  the  frame  buffer.  1  wo  memory  accesses  arc  required 
to  access  each  mask  (Figure  9-10).  The  first  memory  access  uses  dircc  parameters:  xniask  is  number 
of  columns  enabled,  x%8  is  the  x-offset  of  the  square  being  updated  from  the  Sx8  square  boundary, 
and  xdir  is  0  if  the  enabled  columns  arc  left  aligned  and  1  if  they  arc  right  aligned.  Hie  second 
memory  access  uses  die  y-dircctional  parameters.  The  two  masks  arc  dicn  OKed  to  produce  the  final 
mask. 


xmask 

0  0  0  0  0  1  1  1 

O'  0  0  0  0  1  1  1 

0  0  0  0  0  1  1  1 

11111111 
11111111 
11111111 
11111111 
11111111 
11111111 


-3 

r 


t-  4 


m 


> 


*  * 


\\  zero  as  the  write  enable  bit  enables  the  write  operation. 


150 


r 


1 

"1 

0 

0 

0 

1 

1 

1 

x%8  =  2 

xmask  =  3 

xciir  =  0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

fT] — -> 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

1 

t 

1 

1 

1 

0 

0 

0 

1 

1 

1 

_ \ 

1 

1 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

T1 

1 

1 

1 

1 

y%8  =  4 
ymask  =  5 
ydir  =  1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

i  rn 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Figure  9-10:  Masking  8x8  squares. 


9.2.  Interprocessor  communication 

The  array  of  display  chips  have  to  communicate  with  each  other  for  two  different  reasons:  data 
alignment  for  Bit  Bit  applications  and  neighbor  access  for  the  image  processing  applications.  The 
communication  pattern  for  BitBlt  is  simply  a  two-dimensional  circular  shifting  of  the  data.  It  can  use 
one  of  several  mechanisms  proposed  in  Chapter  4.  The  faster  mechanisms  require  more  pins  than  the 
slower  ones. 

Figure  9-11  shows  a  display  chip  with  four  tri-state  paths  such  that,  during  one  cycle,  the  display 
chip  can  enable  its  data  on  any  one  of  the  paths  and  can  also  latch  the  data  from  any  of  the  remaining 
paths.  This  requires  4g  pins  for  data  (g  is  the  number  of  bits  in  a  pixel),  and  four  pins  for  control:  two 
to  specify  the  input  direction  and  two  to  specify  the  output.  This  configuration  can  be  used  to 
implement  all  the  interconnection  mechanisms  discussed  in  Chapter  4.  In  the  ease  of  mechanisms 
that  use  only  two  paths  like  the  neighbor  connections  in  the  one-dimensional  organization  and  the 
sequential  connection  in  the  square  organization,  only  two  paths  are  bonded  to  implement  these 


r 


r 


r4 


r 


i 


t 


i 


DISPLAY  DI-SIGN 


151 


->  DrO-U 

Scan-line  neighbor 


-> 

□ 

Square  neighbor 


-> 


Square  tri-state 

Figure  9-11:  Four  tri-state  paths  for  each  display  chip. 

mechanisms  without  the  use  of  extra  pins.  The  proposed  configuration  is.  ideally  suited  for  the  tri¬ 
state  communication,  which  requires  three  control  pins  to  specify  the  direction  of  transfer.  This 
configuration  hence  uses  one  extra  pin  for  control,  but  provides  the  versatility  of  being  usable  for 
other  interconnection  mechanisms. 

The  tri-state  paths  allow  access  to  any  of  the  eight  neighbors  in  each  cycle.  1'he  image  processing 
algorithms  discussed  in  the  previous  chapter  use  neighboring  pixel  intensities  and  can  be 
implemented  efficiently  using  the  four  tri-state  data  paths. 


9.3.  Processor 

The  computations  required  of  the  display  chip  arc  as  follows: 

1.  Bit  Bit  uses  elementary  boolean  operations  for  bit-map  images  (Table  4-1),  and  simple 
arithmetic  operations  for  grayscale  images  (  Table  4-2). 

2.  'ITic  parallel  algorithm  to  compute  line  strokes  computes  the  perpendicular  distance  to 
the  line  (Figure  7-4).  This  distance  computation  requires  the  multiplication  of  numbers  in 
the  range  between  0  and  7.  This  multiplication  can  he  implemented  using  three 
sequential  additions,  although  a  single-step  implementation  uses  only  a  3-inpul  adder. 
The  same  algorithm  is  used  for  computing  trapezoidal  patches.  The  distance  computation 


152 


is  followed  by  a  table  lookup  to  find  the  anti-aliased  intensity  of  individual  pixels.  This 
table  can  be  part  of  the  processor  but  we  choose  to  store  it  in  the  frame  buffer  memory. 

This  choice  has  two  implications.  Firstly,  the  frame  buffer  memory  must  be  larger  than 
die  display  area.  Secondly,  the  processor  must  be  able  to  modify  the  addresses  of  the 
frame  buffer  memory  with  a  computed  value,  which  in  this  case  is  the  perpendicular 
distance. 

3.  The  algorithms  for  sub-pixel  translation  and  non-integer  scaling  require  linear 
interpolations.  Linear  interpolation  is  a  computation  of  the  form  (do  +  />(1-  a)),  where 
A  and  B  are  two  intensity  values  and  a  is  real  number  less  than  one.  If  a  can  be 
represented  as  a  fraction  p/q  such  that  q  is  power  of  2,  then  the  interpolation  can  be 
implemented  as  a  scries  of  additions  followed  by  right  shifts. 

For  the  applications  discussed  in  this  diesis,  the  display  chip  processor  can  perform  these 
computations  with  an  extremely  simple  ALU  together  with  a  few  registers  to  store  temporary  values. 

9.4.  Display  chip 

This  section  describes  a  display  chip  (DChip)  that  is  designed  for  a  4-bit  gray-scale  8x8  display 
system.  An  8xS  display  system  will  use  64  copies  of  diis  display  chip.  Hach  display  chip  interfaces  to 
four  64K  RAM  chips.  The  chip  provides  tri-state  connections  to  four  adjacent  neighbors.  Due  to  die 
lack  of  pins,  the  chip  provides  only  a  video  buffer  for  single-bit  video.  A  block  diagram  of  this  chip  is 
shown  in  Figure  9-12,  and  a  chcckplot  of  die  chip  is  shown  in  Figure  9-13. 

9.4.1 .  Data  path 

The  DChip  data  path  is  8  bits  wide  and  contains  nineteen  registers  and  an  ALU  connected  by  a 
single  bus.  The  ALU  performs  8-bit  computations  and  has  two  input  registers  for  its  input  operands 
and  two  output  registers  one  of  which  can  be  chosen  to  deposit  the  result  of  the  operation. 

Hach  instruction  execution  is  composed  of  die  two  non-overlapping  phases  of  a  two-phase  clock. 
During  the  first  phase  the  bus  transfers  data  from  one  register  to  another.  One  of  the  registers 
can  be  enabled  onto  the  bus  and  the  contents  of  the  bus  can  then  be  loaded  into  a  different  register. 
Meanwhile  the  carry  path  of  the  ALU  is  precharged  during  this  phase  ofihc  clock.  During  the  second 
phase  (<p2),  the  ALU  performs  its  computation  and  loads  the  result  into  either  one  of  the  output 
registers  (ALUOl  or  ALU02).  The  operands  of  the  computation  come  from  the  ALU  input  registers 
AI.UI1  and  AI.UI2,  which  had  to  have  been  set  up  during  previous  register  transfer  operations. 
Double  operand  operations  will  hence  take  at  least  two  instructions  to  execute.  The  bus  is  precharged 
during  the  second  phase  of  'he  clock. 


VIDEO  TO  NEIGHBORS 

Figure  9- 1 2:  Block  diagram  of  DC'hip. 


Four  registers  (MADDR,  MDATAIN,  MDATAOUT,  and  MWH)  interface  the  data  path  with  the 
RAM  chips.  MADDR  can  be  used  to  provide  an  8-bit  address  to  memory  chips,  MDATAIN  is  used 
to  latch  the  result  of  a  memory  read,  while  MDATAOUT  provides  the  data  for  a  memory  write.  Only 
the  lower  order  4  bits  of  these  registers  arc  used.  The  lowest  order  bit  of  the  MW1‘  register  is  enabled 
onto  the  write  enable  pins  of  the  memory  chips  during  a  write  operation. 


Another  register  (COM)  interfaces  die  data  path  to  neighboring  DChips. 


urc  9- 1 3:  Chcckplot  of  DChip. 


F~* 

DISPLAY  DliSIGN 


155 


9.4.2.  Memory  interface 

DChip  interfaces  to  die  four  64K  RAM  chips  by  13  wires.  8  of  these  wires  (ADDRHSS)  carry  tire 
address  and  are  bussed  to  all  the  four  memory  chips.  4  wires  (DATA)  arc  used  for  tire  data  of  the  four 
memory  chips.  These  wires  are  bidirectional  because  they  arc  used  to  both  read  and  write  memory 
data.  The  13th  wire  (WH)  is  bussed  to  the  write  enable  pins  of  all  the  four  memory  chips. 

The  address  is  generated  from  either  the  MADDR  register  of  the  data  path  or  the  input  from  die 
1MDATA  pins.  The  1MDATA  passes  through  an  incrcmentor,  the  result  of  which  is  eidier  the  same 
as  the  input  or  incremented  by  one.  The  result  from  die  incrcmentor  is  then  multiplexed  with  the 
contents  of  the  MADDR  register  to  the  input  of  the  pipelining  bitch  before  the  ADDRHSS  wires. 
The  MADDR  register  is  intended  to  provide  computed  addresses  to  memory  and  is  used  to  provide 
table  look-up  operations. 

The  incrcmentor  and  die  multiplexer  arc  controlled  by  the  two  bits  of  die  ADDRSHHRCT  input. 
If  both  the  bits  arc  zero,  dicn  die  value  presented  at  the  1MDATA  input  is  available  unchanged  to  the 
input  of  the  pipelining  latch.  If  either  of  bits  is  one,  dicn  the  1V1DATA  input  is  incremented  by  one 
and  selected  to  the  input  of  the  latch.  (Note  that  the  ADDRSHHRCT  inputs  act  in  a  manner  similar 
to  the  R/nc  and  Cine  control  pins  discussed  in  Section  9.1.2.)  If  both  of  bits  are  one.  then  die  content 
of  die  MADDR  register  is  selected  to  the  input  of  die  latch.  'lTic  pipelining  latch  is  controlled  by  the 
ADDRI..ATCH  input.  When  this  control  signal  is  high  the  input  to  the  latch  is  latched  and  remains 
latched  even  when  die  signal  goes  low.  The  output  of  die  latch  is  always  enabled  on  die  ADDRHSS 
pins. 

The  DATA  pins  arc  tri-state  pins  that  can  be  driven  either  from  the  chip  or  externally  from  the 
memory  chips.  If  driven  from  the  chip,  Uiey  arc  driven  with  the  contents  of  the  MDATAOUT 
register.  If  driven  by  the  memory  chips,  then  the  contents  can  be  latched  into  the  M DATA  IN 
register.  The  two  functions  arc  controlled  by  die  DATAHNABI.H  and  DATA  HATCH  inputs.  When 
both  these  inputs  arc  low.  the  DATA  pins  arc  held  at  high  impedance.  When  DATAHNABI.H  is 
asserted,  the  MDATAOUT  register  is  enabled  out,  and  when  DATA  HATCH  is  asserted  the  input  is 
latched  into  the  MDATAIN  register. 

The  value  of  the  lowest  order  bit  of  the  MWH  register  is  enabled  onto  die  WR  pin  when 
DATAHNABI.H  is  on. 


None  of  die  elements  in  the  path  of  die  memory  address  are  controlled  by  the  data  path  clock:  the 


memory  can  lienee  be  addressed  and  accessed  asynchronously.  The  pipelining  latch  is  intended  to 
speed  up  the  addressing  process. 

9.4.3.  Interprocessor  communication 

The  display  chips  can  communicate  with  other  similar  chips  to  provide  the  data  movement  needed 
to  move  parts  of  the  image  from  one  part  of  the  display  to  another.  Each  display  chip  has  four 
independent  tri-state  ports  which  can  be  used  to  interconnect  several  of  these  chips  into  one  of 
several  interconnection  mechanisms  (Figure  9-11).  Data  from  the  COM  register  can  be  sent  to  a 
neighbor  and  the  data  from  another  neighbor  can  be  latched  into  the  same  register  during  each 
communication  cycle,  which  is  controlled  by  the  COMCl.OCK.  because  the  transfer  is  expected  to 
take  place  in  one  COMCl.OCK  cycle,  the  COM  register  is  implemented  as  a  Master-Slave  register. 
When  the  clock  is  high,  die  contents  of  the  Slave  is  enabled  out  onto  one  of  the  four  ports  and  the 
contents  of  a  different  port  arc  latched  into  the  Master  part  of  the  register.  When  the  clock  goes  low, 
die  contents  of  the  Master  arc  transferred  to  the  Slave.  The  direction  of  the  transfer  is  controlled  by 
two  control  inputs  COMOUT  and  COM1N,  each  of  which  contains  two  bits.  Each  of  these  controls 
specify  the  port  onto  which  die  data  is  to  enabled  and  the  port  from  which  the  data  is  to  be  latched. 

The  communication  timing,  similar  to  memory  addressing,  is  asynchronous  to  the  datapath  clock. 
This  is  to  allow  the  Align  operation  of  bitbit  to  be  performed  fast  as  possible  independent  of  die 
speed  of  the  other  parts  of  the  system.  It  also  allows  the  Ali&n  operation  to  overlap  with  memory 
cycles,  a  requirement  discussed  in  Section  4.3.4. 

9.4.4.  Op  code 

The  bus  and  the  ALU  are  controlled  by  an  eight-bit  op-code.  During  die  first  phase  of  processor 
clock  these  eight  bits  specify  the  source  and  destination  registers  for  die  bus  transfer.  During  the 
second  phase  (<p2),  they  specify  the  ALU  operation. 

The  bus  transfer  takes  place  by  enabling  die  contents  of  one  of  the  registers  onto  the  bus  and  then 
latching  the  contents  of  the  bus  into  another  register.  As  far  as  the  bus  transfer  is  concerned  some  of 
the  registers  in  the  data  path  are  read-only  or  write-only.  In  particular,  MDATAOUT  is  write-only, 
MDATA1N  is  read-only,  MWE  is  write-only,  A1.UI1  and  ALUI2  arc  write-only,  and  AI.U01  and 
ALU02  arc  read-only.  The  codes  used  to  specify  the  source  ami  destinations  of  the  bus  transfer  are 
the  following. 


D1SPI  AY  DHSIGN 

Code 

Source 

Destination 

0 

IMDATA 

MWE 

1 

MAD  DR 

MADDR 

2 

M DATA  IN 

MDATAOUT 

3 

V1DCONF 

V1DC0NF 

4-12 

Regs  4-12 

Regs  4-1 2 

13 

COM 

COM 

14 

ALUOl 

ALUI1 

15 

ALU  02 

ALU12 

The  ALU  operates  on  two  eiglu-bil  operands  which  come  from  die  two  ALU  input  registers  and 
deposits  its  result  into  one  of  the  two  ALU  output  registers.  The  carry-bit  from  die  most  significant 
bit  of  die  ALU  is  latched  into  a  carry  latch  which  can  be  used  to  control  subsequent  A1..U  operations. 
The  ALU  op-code  is  specified  in  four  parts.  The  first  part  is  a  five-bit  field  which  specifics  the 
operation  to  be  performed  by  die  ALU.  The  second  part  is  a  single  bit  which  specifies  if  the  result  of 
the  ALU  operations  is  to  be  half-byte  swapped.  'Hie  diird  part  is  a  single  bit  which  specifies  which  of 
the  output  registers  is  to  be  loaded  with  die  result  of  the  ALU  operation.  I  lie  fourth  part  is  another 
single  bii  which  specifics  if  the  operation  is  a  conditional.  If  it  is,  then  the  carry  from  die  previous 
operation  is  used  to  abort  the  loading  of  the  output  register.  The  output  register  is  loaded  only  if  the 
carry  latch  is  not  asserted. 


Code 

ALU  Operation 

• 

0 

NOP  (Output  register  not  It 

1 

0 

2 

-1 

2 

ALUll 

3 

ALUJ2 

4 

NOT  ALU II 

5 

NOT  ALUI2 

6 

—  ALUll 

7 

-ALU12 

10 

ALUU +  1 

11 

ALUI2+1 

12 

ALUll -1 

13 

ALUI2-1 

14 

ALUll  +AI.U12 

15 

ALUll +ALUI2+ Carry 

16 

ALUll  — AI.UI2 

17 

ALU  1 1  -  A  LU 12  -  llorrow 

20 

ALUll  AND  AI.U12 

21 

ALUll  OR  ALUI2 

22 

ALUll  XOR  ALU  12 

23 

ALUll  OR  ALU  12’ 

153 


24 

AI.L’ll  lshift  1  (Zero  Insert) 

25 

ALU  12  lshift  1  (Zero  Insert) 

26 

AI.Ull  lshift  1  (One  Insert) 

27 

ALU  12  lshift  1  (One  Insert) 

30 

ALU  11  Irot  1 

-1 

ALU  12  lrot  1 

9.4.5.  Video  buffer 

There  is  a  single-bit  video  buffer  on  each  chip.  It  consists  of  two  shift  registers  which  shift  in 
opposite  directions  (Figure  9-4).  Kach  shift  register  is  96  bits  long.  The  shift  registers  are  controlled  by 
four  signals.  As  seen  before,  DATAI.ATC1I  latches  the  contents  of  the  DATA  wires  into  the 
M DATIN'  register.  YlDSIHiTIN  shifts  one  of  the  video  latch  bits  into  the  Input  Shift  Register.  'Hie 
contents  of  the  YIDCONF  register  specify  which  bit  gets  shifted  in:  the  contents  of  the  two  registers 
arc  ANDed  and  the  result  ORcd  to  produce  the  bit  that  gets  shifted  into  video.  YIDTRANSFHR 
transfers  the  contents  of  the  Input  Shift  Register  to  die  Output  Shift  Register.  V1DSII1FTOUT  shifts 
out  die  top  bit  of  the  Output.  Shift  Register  to  die  V1DF.0  pin.  It  also  enables  the  VIDFO  pin  which 
is  otherwise  held  at  tri-statc. 

9.4.6.  Pin  summary 

DChip  lias  64  pins,  38  of  which  arc  used  for  data  and  23  for  control.  Three  pins  arc  reserved  for 
Power,  Ground,  and  Substrate. 

Of  die  3S  data  pins,  16  arc  used  communicate  widi  the  four  neighbors.  Another  16  arc  required  to 
input  the  1MDATA,  and  output  the  ADDRKSS.  both  of  which  arc  eight-bit  numbers.  The  memory 
DATA  wires  require  4  pins  and  the  \VF  wire  requires  1  pin.  P>ecausc  diis  chip  contains  a  single  bit 
video  buffer,  there  is  only  1  V1DKO  pin. 

Of  die  control  pins.  8  arc  used  to  input  the  AI.U  op  code.  There  arc  2  ADDRSl-LKCf  pins,  2  each 
for  COM  IN  and  COMOUT.  and  1  each  for  the  DATA  I  IN  A  BIT.  DATA  LATCH.  A  DDR!  ATCH, 
and  YIDTRANSFHR  inputs.  The  main  clocks  for  the  datapath  ((p(  and  yj  arc  input  from  different 
pins,  while  die  other  three  clocks  for  die  neighbor  shifting,  the  input  and  output  shift  registers  arc  ' 
input  from  one  pin  each. 


DISPLAY  151-SIGN 


159 


9.5.  Simulation  and  performance 

Using  the  proposed  design,  this  section  presents  the  instruction  sequences  required  in  the  inner 
loops  for  the  BitBIt  copy  and  invert  operations  and  makes  speculations  about  their  performance.  Both 
these  inner  loops  assume  that  the  masking  required  for  die  end  conditions  is  handled  elsewhere. 

'ITiis  discussion  assumes  a  processor  cycle  time  of  approximately  200  ns.  Assuming  a  memory  cycle 
time  of  400  ns,  a  memory  read  or  write  takes  2  processor  cycles.  Also  assumed  will  be  that  a  neighbor 
communication  step  also  takes  200  ns.  A  two  dimensional  end  around  shift  hence  takes  a  maximum 
of  4  processor  cycles.  If  we  assume  the  abstraction  of  die  procedures  Alignix^y^x^yJ, 
Memory Rcad(.x.y),  and  MemoryWrit({x,y)  which  take  4,  2,  and  2  processor  cycles  respectively,  then 
die  inner  loop  of  the  BitBIt  copy  operation  is: 

BBCopyl.oop: 

(1)  Align(x1,y1.x2,y2);  MemoryVVritc(x2,y2);  x2  :=  x2+8;  y2  :=  y2+8; 

(2) 

(3)  McmoryRead(x.,y1);  x,  :=  Xj+8;  yt  :=  yj+8; 

(4) 

(5)  MDATAOUT:=  COM; 

(6)  COM  :=  M  DATA  IN;  GO  TO  BBCopyl.oop; 

The  inner  loop  of  die  BitBIt  copy  operation  comprises  of  6  microinstructions,  which  at  the  rate  200 
ns/instruction  would  take  1.2  jxs  to  execute.  To  scroll  a  768x1024  screen  would  hence  Like 
approximately  15ms.  which  is  less  than  one  frame  time  for  a  refresh  rate  of  60  frames/second. 

The  inner  loop  of  the  BitBIt  invert  operation  docs  not  require  the  Align  operation  if  a  display 
region  is  being  inverted  in  place.  The  loop  is  hence  die  following: 

BBlnvLoop: 

(1)  AI.U11 :  =  MDATAIN;  Memory Writc(x.y);  x:  =  x  +  S;  y:  =  y  +  8; 

(2)  AI.UOl  :=  NOT  ALUI1; 

(3)  M  DATA  OUT :  =  AI.UOl;  McmoryRcad(x.y); 

(4)  GOTO  BBlnvLoop; 

Using  this  four  instruction  inner  loop,  a  768x1024  display  can  be  inverted  in  place  in  approximately 

10  ms! 


160 


9.6.  Design  scalability 

This  section  die  implications  of  scaling  various  parameters  to  the  display  chip  design,  llicse 
parameters  are  - 

o  The  size  of  the  square  of  the  memory  organization.  The  size  of  the  square  determines  die 
number  of  pixels  that  can  be  updated  in  parallel. 

o  The  size  of  each  RAM  chip.  The  size  of  the  memory  chip  determines  the  total  size  of  die 
display  for  any  given  number  of  memory  chips. 

o  The  number  of  grayscale  bits  in  each  pixel.  This  value  determines  the  number  of 
different  shades  portrayablc  by  each  point. 

The  total  number  of  memory  chips  in  a  display  system  using  the  MxM  square  memory  organization 
with  g  bits  of  gray-scale  system  is  if  each  chip  contains  k  bits  then  die  total  number  of  bits  in  the 
system  is  gkM~. 

The  display  chip  contains  6  external  pixel  data  paths,  four  for  the  neighbors,  and  one  each  for  the 
memory  and  the  video  (the  current  design  only  has  one  bit  of  video).  It  also  contains  2  external 
address  paths  (for  I M DATA  and  ADDRKSS),  where  each  address  is  ln(A)/2  bits.  Each  display  chip 
hence  uses  6g+ln(£)/2  pins  which  will  change  with  g  and  k. 

Scaling  the  size  of  die  square  affects  die  number  of  pixels  that  can  be  udpated  simultaneously. 
However,  if  die  desire  is  to  build  a  display  with  a  given  total  area,  then  die  size  of  individual  memory 
chips  will  change  with  a  change  in  die  size  of  the  update  square.  If.  for  example,  the  size  of  the 
update  square  is  doubled  (i.c.  .1/  :  =  2.1/),  then  the  memory  chips  can  be  l/4th  the  size  resulting  in  2 
fewer  pins.  Scaling  the  size  of  the  memory  chips  has  the  converse  effect.  The  biggest  impact  on  die 
display  chip  is  by  the  number  of  bits  in  the  grayscale  because  of  die  need  of  6  pins  per  bit  in  gray¬ 
scale.  In  fact,  as  g  increases  beyond  a  certain  point,  it  is  indeed  not  possible  to  handle  the  whole  pixel 
in  one  display  chip.  A  solution  to  this  would  be  split  the  pixel  into  a  number  of  bit  sliced  display 
chips.  This  solution  results  in  slower  speeds  for  operations  v. '  eh  have  to  propogate  a  carry  through 
the  entire  pixel  value. 


» 


DISPL  AY  DESIGN 


161 


9.7.  Status 

DChip  was  fabricated  using  the  M0S1S  system  provided  by  ISI.  The  chip  was  sent  for  fabrication 
on  October  31,  19S1.  Se\cn  parts  were  returned  on  January  6,  19S2.  They  were  tested  functionally 
using  an  elementary  test  rig  and  were  found  to  be  working  functionally.  They  were  not  tested  for 
timing  and  speed  information. 

This  thesis  does  not  address  the  design  of  the  controller  which  would  control  tire  display  memory 
system.  The  controller  design  is  as  crucial  as  the  design  of  the  memory  system,  because  it  is  up  to  the 
controller  to  fully  utilize  tire  capabilities  of  the  memory  system. 


CONCLUSION 


163 


Chapter  1 0 
Conclusion 

This  thesis  is  the  study  of  the  feasibility  of  a  display  design  in  which  the  underlying  primitive  is  the 
parallel  update  of  any  small  square  region  of  the-  display.  This  primitive  operation  is  justified  by  the 
well  known  principle  of  locality,  which  states  that  if  any  display  operation  updates  a  given  pixel  then 
the  next  pixel  it  updates  will  be  in  the  close  proximity  of  the  current  location.  Since  ihe  display  is 
inherently  a  two-dimensional  device,  these  successive  pixels  are  likely  to  be  in  a  close  two- 
dimensional  neighborhood,  and  can  hence  be  updated  in  parallel  using  the  square  update  primitive. 
The  conventional  scan  line  approach  allows  tire  parallel  update  of  only  horizontal  spans  and  is  hence 
convenient  only  for  operations  in  which  successively  updated  pixels  lie  along  scan  lines.  This  thesis 
attempts  to  demonstrate  that  square  updates  arc  more  desirable.  It  docs  so  by  looking  at  a  wide  range 
of  display  applications  and  showing  algorithms  that  can  be  implemented  efficiently  using  square 
updates. 

The  square  organization  is  effective  only  if  die  size  of  the  square  being  updated  is  approximately 
equal  to  the  size  of  the  display  objects.  If  die  size  of  die  objects  is  much  larger  than  the  size  of  square 
dian  die  scan  line  organization  can  achieve  die  same  efficiency  as  the  juare  organization  (efficiency 
is  measured  in  terms  of  die  number  of  pixels  actually  updated  as  a  percentage  of  the  number  that  can 
be  updated).  So.  for  example,  the  scan  line  organization  is  as  efficient  as  the  square  organization  to 
scroll  die  contents  of  die  whole  display.  Conversely,  the  8xS  square  organization  is  more  efficient 
than  die  64x1  scan  line  organization  to  update  an  8x10  character.  Since  most  operations  call  for  the 
update  of  small  squarish  objects,  the  square  organization  is  indeed  a  justified  choice  for  die  display 
organization.  [Sutherland  81]  contains  a  more  detailed  performance  model  justifying  die  advantages 
of  the  8x8  memory  organization. 

Ihe  Bit  Bit  operation  (defined  in  Chapter  4)  can  be  implemented  more  efficiently  for  the  square 
display  organization  than  the  scan  line  organization.  1'his  operation  has  been  found  useful  IV  r  a  large 
class  of  display  applications  which  operate  on  rectangular  regions.  It  can  also  be  used  for  non- 


164 

rectangular  operations  like  drawing  lines  and  filling  polygons.  This  is  done  by  precomputing 
segments  of  all  possible  lines  and  polygons  and  putting  them  together  to  form  the  desired  images. 
This  is  a  very  powerful  approach  to  achieve  high  performance  for  these  operations  and  eliminates  the 
need  for  hig’-  speed  processing.  The  algorithms  which  show  that  strokes  can  indeed  be  used  for 
graphics  are  the  largest  contribution  of  this  research.  These  algorithms  can  be  extended  for  grayscale 
graphics  to  produce  smooth  edges. 

The  stroke  algorithms  can  be  adapted  to  display  lines  and  polygons  whose  endpoints  do  not  lie  on 
pixel  centers.  Non-integer  endpoints  are  necessary  to  avoid  dynamic  aliasing,  which  causes  jitter  in  a 
moving  image  because  of  die  quantization  of  endpoints  to  pixel  centers.  But  the  number  of  strokes 
required  when  the  endpoint  restriction  is  removed  is  unpractically  large.  Under  diesc  circumstances, 
a  parallel  processing  approach  can  be  used  to  compute  each  stroke.  The  endpoints  of  lines  and 
polygons  with  non-integer  endponts  require  a  square  array  of  processors,  the  size  of  the  square  being 
exactly  the  same  as  die  size  of  die  square  which  can  be  updated  by  the  display  architecture. 

The  square  display  architecture  is  also  ideal  for  most  low-level  image  processing  algoridims. 
Chapter  8  discusses  a  given  class  of  such  algorithms,  which  arc  commonly  used  to  present  images  in 
more  desirable  formats.  The  attempt  of  this  presentation  was  not  to  discover  new  image  processing 
algorithms,  but  to  show  that  existing  algorithms  can  be  easily  adapted  to  provide  high  performance 
using  the  square  array  of  processors  provided  by  our  display  design. 

Chapter  9  presents  the  design  of  a  simple  processor,  an  array  of  which  can  be  used  to  implement 
the  algorithms  discussed  for  the  various  different  applications,  'flic  fact  that  an  extremely  simple 
processor  is  sufficient  for  a  very  large  class  of  applications  is,  in  my  view,  the  single  most  significant 
contribution  of  this  thesis. 

10.1 .  Future  designs 

Hie  design  of  die  display  chip,  although  simple,  uses  64  pins.  This  can  be  easily  reduced  if  the 
memory  chips  can  be  modified  slightly.  The  easiest  and  most  beneficial  modification  would  be  allow 
the  address  incrementing  required  to  access  squares  which  overlap  word  boundaries.  One  extra  pin  in 
the  memory  chip,  which  when  asserted  would  increment  the  address  by  one,  is  sufficient  to  decrease 
die  pin  count  in  the  display  chip  by  18.  The  other  desirable  modification  is  to  provide  shift  registers 
which  can  store  all  the  bits  read  by  the  sense  amplifiers  when  one  row  of  the  memory  array  is 
enabled.  T  his  data  can  be  used  serially  for  screen  refresh,  which  would  both  increase  the  total 
memory  bandwidth  available  and  decrease  the  pin  count  of  the  display  chip. 


t 


r 


r- 

... 


r 


»  i 


CONCI.USION 


165 


lie  idea  of  using  a  square  array  of  processors  to  provide  the  ability  of  updating  several  pixels  in 
parallel  can  be  extended  to  encompass  a  much  larger  range  of  applications  than  die  ones  discussed  in 
this  thesis.  In  the  area  of  computer  graphics  this  thesis  restricted  itself  to  straight  lines  and  edges. 
Similar  algorithms  can  be  used  for  parametric  curves,  but  they  require  better  computing  primitives  in 
each  display  chip. 

And  lastly.  I  would  like  to  emphasize  the  need  to  build  a  display  to  really  study  die  performance 
tradeoffs.  Theoretical  models  and  simulations  provide  a  lot  of  information  but  arc  not  sufficient  for 
two  reasons.  The  first  reason  is  that  the  real  life  display  usually  presents  surprises  that  the  simulations 
did  not  account  for.  Hut  the  second  and  more  important  reason  is  that  the  use  of  a  display  gives  rise  to 
new  and  innovative  uses  of  computer  graphics,  and  these  could  completely  change  the  performance 
tradeoffs  that  the  display  design  had  assumed. 


References 


[Ball  SI]  Ball,  E. 

Canvas:  the  Graphics  Package  for  the  Spice  Personal  Timesharing  System. 

In  Computer  Graphics  81:  Proceedings  of  the  International  Conference ,  pages  269- 
2S0.  1931. 

[Baskctt  76]  Baskctt,  Forrest  and  Slutstek,  I .eonard. 

The  Design  of  a  l  ow  Cost  Video  Graphics  Terminal. 

Computer  Graphics  10(2):  235-240,  Summer,  1976. 

[Bawdcn  77]  Bawden,  A.,  et.al. 

Lisp  Machine  Project  Report. 

Technical  Report  AIM  444,  M.l.T.  A. I.  Lab,  Cambridge,  Mass.,  August,  1977. 
[Bcchtolshcim  80] 

Bcchtolshcim,  Andreas  and  Baskctt,  Forest. 

High-Performance  Raster  Graph!.  .  for  Microcomputer  Systems. 

Computer  Graphics  14(3),  July,  1980. 

[Brescnham  65]  Brcscnham,  J.E. 

Algorithm  for  computer  control  of  a  digital  plotter. 

HIM  Systems  Journal  4(l):25-30,  July,  1965. 

[Clark  80]  Clark,  J.H.,  and  Hannah,  M.R. 

A  High-Performance  Smart  Image  Memory. 

Lambda ,  3rd  Quarter,  1980. 

[Crow  76]  Crow,  F.C. 

The  Aliasing  Problem  in  Computer-synthesized  Shaded  Images. 

PhD  diesis.  University  of  Utah,  March,  1976. 

[Denes  75]  Denes,  P.B. 

A  Scan-type  Graphics  System  for  Interactive  Computing. 

Proceedings  ll'.l'.l'.  Conference  on  Computer  Graphics,  Pattern  Recognition,  and 
Data  Stmctures  :21,  May,  1975. 

[Fcibush  80]  Fcibush,  Eliot  A.,  l  .evoy,  Mark,  and  Cook,  Robert  L. 

Synthetic  Texturing  Using  Digital  Filters. 

Computer  Graphics  14(3):294-30I,  July,  19S0. 

[Jordan  74]  Jordan,  B.W.,  Jr.  and  Barrett.  R.C. 

A  Cell  Organized  Raster  Display  for  Line  Drawings. 

CACM  17(2):676,  February,  1974. 

[Kajiya  75]  Kajiya  James  T..  Sutherland  Ivan  E..  and  Chcadlc  Edward  C. 

A  Random- Access  Video  Frame  Buffer. 

Proceedings  / 1:  HI:  Conference  on  Computer  Graphics,  Pattern  Recognition,  and 
Data  Structures  :1.  May,  1975. 


CONCLUSION 

[I.cIcrSO] 

[Lucas  81] 
[McCracken  75] 

[Noll  71] 

[Ophir  68] 

[Roberts  65] 

[Rothstcin  79] 

[Sproull  79] 

[Sproull  81] 

[Sutherland  70] 

[Sutherland  81] 

[Tcrlct  67] 


167 


I.clcr,  William  J. 

Human  Vision,  Anti-aliasing,  and  the  Cheap  4000  Line  Display. 

Computer  Graphics  14(3): 30S-3 13,  July,  1930. 

Lucas,  Bruce. 

Persona!  communication. 

McCracken  T.H.,  Sherman  B.W.,  Dwyer  S.J.  III. 

An  economical  Tonal  Display  for  Interactive  Graphics  and  Image  Analysis  Data. 
Computers  &  Graphics  1(1 ):  79-94,  1975. 

Noll,  A.  Michael. 

Scanncd-Display  Computer  Graphics. 

CACM  14(3),  March,  1971. 

Ophir  D..  Rankovitz  S..  Shepherd  B.J.,  and  Spinard  R.J. 

BRAD  :  The  Brookhaven  Raster  Display. 

CACM  11(6):415,  June,  1963. 

Roberts  L.G. 

Machine  Pcrccpticn  in  ITircc-dimcnsional  Solids. 

Optical  and  Tlcctrooptical  Information  Processing,  MIT  Press,  Cambridge, 
Massachusetts. :  159-165,  1965. 

Rothstcin,  J.  and  Weiman,  C.F.R. 

Parallel  and  sequential  speci  fiction  of  a  context  sensitive  language  for  straight  lines 
on  grids. 

Computer  Graphics  and  Image  Processing  5,  1979. 

Sproull,  Robert  F. 

Raster  Graphics  for  Interactive  Programming  environments. 

Computer  Graphics  13(2):S3-93,  August,  1979. 

Sproull  Robert  F. 

Using  Program  Transformations  to  Derive  Line-Drawing  Algorithms. 

Tcchni  !  Report,  Carncgie-Mcllon  University,  Computer  Science  Department, 
1981. 

Sutherland,  Ivan  F„ 

Computer  Display. 

Scientific  American  ,  June,  1970. 

Sproull  R.F.,  Sutherland,  I.H.,  Thompson  A..  Gupta,  S.,  and  Minter,  C. 

The  8  by  8  Display. 

Technical  Report.  Carncgie-Mcllon  University,  Computer  Science  Department, 
1981. 

Tcrlct,  J.H. 

The  CR  T  display  subsystem  of  the  IBM  1500  instructional  system. 

ATI  PS  Conference  Proceedings  3 1 , 1  JCC.  1967. 


168 


[Thacker  SI]  Thacker,  C.P.,  McCrcight,  EM.,  Lampson,  B.W.,  Sproull,  R.F.,  and  Boggs.  D.R. 
Alio  :  A  Personal  Computer. 

In  Siewiorek,  1)..  Bell,  C.G.,  and  Newell.  A.  (editor).  Computer  Structures : 
Principles  and  Examples,  second  edition. .  McGraw-Hill,  1981. 

[Warnock  SO]  Warnock  John  H. 

The  Display  of  Characters  Using  Gray  Level  Sample  Arrays. 

Computer  Graphics  14(3):302-307,  July,  1980. 

[Weiman  80]  Weiman,  Carl  F.R. 

Continuous  Anti-Aliased  Rotation  and  Zoom  for  Raster  Images. 

Computer  Graphics  14(3),  July,  19S0. 


